如何使用带有Scrapy的admin-ajax.php从网站上抓取数据

I am trying to scrape the reviews about unibet casino on that website : https://casinoplacard.com/unibet-casino-reviews-and-bonuses/

As I did for other sources of reviews I used Scrapy on Python to scrape the reviews with the code below :

class slotRunner_spyder(scrapy.Spider):
count=0

name = "slotRunner_spyder"
start_urls = [

       'https://casinoplacard.com/unibet-casino-reviews-and-bonuses/'
]
def parse(self, response):

    parsed_uri = urlparse(response.url)
    domain = '{uri.scheme}://{uri.netloc}/'.format(uri=parsed_uri)

    for review in response.css('div.rwp-users-reviews > div.rwp-u-review') :
        self.count+=1
        yield {
            'name': review.css('td a::text').extract_first(),
            'date': review.css('td small::text').extract_first(),
            'review': review.css('div.rwp-u-review__content > div.rwp-u-review__comment').extract(),
            'url' : response.url
        }
    print(self.count)

But for that website it does not work. To understand better I have introduced the counter (self.count) and discover that it do only 1 iteration which is not normal...

Then I have spent some tiem studying the DevTools of that website and I have discover that when the page is loaded, a XHR POST request method is done automatically with the URL : https://casinoplacard.com/wp-admin/admin-ajax.php

And by looking into that request I have found the 182 reviews data in :

Preview >> Data >> Reviews

So could you guys please help me understand how it works to catch those data ?

Thank you very much !

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
普通网友 2018-07-12 12:56
关注
I finally found how to do so, I am sure this is not the best way but at least I did what I wanted to do.

So as I said in my question in the preview tab there were all the data I needed. So what I had to do was getting those data. To do so I understood that when the URL is loaded that XHR POST request were made automatically so I just tried to force python to request that URL.

import requests s = requests.Session() # We get the URL into that session s.get(url) #Here is the imitation of the POST request self.r = s.post(ajax_URL,data=param,headers=headers)`

The parameters you just get them from the headers tab of the DevTool, then the form data is your parameters. For the header you get it also in the header tab, you search for User-Agent and just paste all that in the headers. The ajax URL is the one I wrote in my question.

Hope that will help someone.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

爬虫 scrapy 导出json文件时，怎么让不同类之间空一行 python 有问必答
2021-06-26 13:47

回答 1 已采纳这是一个JSON数组，JSON里面不能空行，否则转换可能会出问题。
SCRAPY运行报错， [scrapy.core.engine] INFO: Spider closed (finished)！ python
2021-07-26 15:56

回答 2 已采纳这个就是正常爬完了的日志信息吧，没啥问题啊
在以瀑布流方式翻页的网站,使用scrapy网络爬虫,但是只爬取了第一页数据,没有爬取第二页. python 爬虫
2021-09-05 19:18

回答 2 已采纳那叫ajax，
03 python38的scrapy处理json格式数据
2023-09-15 22:34

海纳百川程序员的博客【代码】03 python38的scrapy处理json数据处理。
用scrapy.Request怎么抓取JS动态页面 python
2022-01-03 10:32

回答 1 已采纳 self.xidian_next_page确定有值吗
scrapy框架+formdata+ajax爬取及翻页问题 python 数据挖掘测试用例
2020-03-25 14:18

回答 1 已采纳 def parse(self, response): result = eval(response.body.decode('utf-8')) 兄弟，你打印一下resu
scrapy中请求携带json与request有什么区别？【赏】 python
2020-12-15 11:12

回答 4 已采纳兄弟，半个小时的辛苦 class CeshiSpider(scrapy.Spider): name = 'ceshi' api_headers = { 'Host'
python爬取js里的数据_python – Scrapy,在Javascript中抓取数据
2020-12-01 00:48

weixin_39926613的博客 (我把它发布到了scrapy-users邮件列表,但保罗建议我在这里发布它,因为它补充了shell命令交互的答案.)通常,使用第三方服务呈现某些数据可视化(地图,表格等)的网站必须以某种方式发送数据,并且在大多数情况下,可以从...
scrapy 中xpath路径获取不到内容 chrome python 前端有问必答
2022-09-05 12:07

回答 2 已采纳 a标签不仅仅含有文本，那么没有其它节点可以定位了吗
【scrapy爬虫问题】scrapy.xpath解析返回的结果有问题，求解答！！！ python
2020-07-13 19:48

回答 1 已采纳先打印response看看和你在网页上看到的一样吗？？
scrapy存到mysql查询无数据 mysql python 数据挖掘测试用例
2020-03-04 16:49

回答 1 已采纳在pipelines文件的内部，打印item，看看数据到底有没有获取到连接数据库成功后，打印一个数据库内部的数据，看看是不是连接成功大概率你没搞到数据，所以什么也没有写入
python就业班-淘宝-目录.txt
2018-12-23 17:59

weixin_30305735的博客 │ │ 01-udp发送数据demo-1.flv │ │ 02-udp发送数据-demo-2.flv │ │ 03-关闭VMware的虚拟网卡（了解）.flv │ │ 04-udp发送数据的强调.flv │ │ 05-（重点）运行python程序以及python交互模式，encode编码，...
php 支付宝订单查询_php,支付宝_支付宝如何查询订单支付情况？，php,支付宝 - phpStudy...
2021-03-22 21:03

逸尘阁-陶生的博客支付宝如何查询订单支付情况？支付宝订单查询文档根据文档示例，运行下面PHP 代码$aop = new AopClient ();$aop->appId = C('alipay.app_id');$aop->rsaPrivateKeyFilePath = C('alipay.private_key');//RS...
Python-reactflaskscrapy构建的单页应用漫画网站
2019-08-10 05:33

在"Soul Manga Master"中，Scrapy可能被用来自动抓取互联网上的漫画资源，如图片、元数据等，为网站填充初始或持续更新的内容。Scrapy支持高效的爬取策略、中间件、爬虫项目管理和数据存储。 **4. CMS内容管理系统*...
Python3网络爬虫开发实战（15）Scrapy 框架的使用（第一版）
2024-09-17 13:30

Bigcrab__的博客 scrapy 使用介绍
没有解决我的问题, 去提问

悬赏问题

¥15 乌班图ip地址配置及远程SSH
¥15 怎么让点阵屏显示静态爱心，用keiluVision5写出让点阵屏显示静态爱心的代码，越快越好
¥15 PSPICE制作一个加法器
¥15 javaweb项目无法正常跳转
¥15 VMBox虚拟机无法访问
¥15 skd显示找不到头文件
¥15 机器视觉中图片中长度与真实长度的关系
¥15 fastreport table 怎么只让每页的最下面和最顶部有横线
¥15 java 的protected权限，问题在注释里
¥15 这个是哪里有问题啊？

如何使用带有Scrapy的admin-ajax.php从网站上抓取数据

1条回答 默认 最新

悬赏问题

1条回答默认最新