如何使用带有Scrapy的admin-ajax.php从网站上抓取数据

I am trying to scrape the reviews about unibet casino on that website : https://casinoplacard.com/unibet-casino-reviews-and-bonuses/

As I did for other sources of reviews I used Scrapy on Python to scrape the reviews with the code below :

class slotRunner_spyder(scrapy.Spider):
count=0

name = "slotRunner_spyder"
start_urls = [

       'https://casinoplacard.com/unibet-casino-reviews-and-bonuses/'
]
def parse(self, response):

    parsed_uri = urlparse(response.url)
    domain = '{uri.scheme}://{uri.netloc}/'.format(uri=parsed_uri)

    for review in response.css('div.rwp-users-reviews > div.rwp-u-review') :
        self.count+=1
        yield {
            'name': review.css('td a::text').extract_first(),
            'date': review.css('td small::text').extract_first(),
            'review': review.css('div.rwp-u-review__content > div.rwp-u-review__comment').extract(),
            'url' : response.url
        }
    print(self.count)

But for that website it does not work. To understand better I have introduced the counter (self.count) and discover that it do only 1 iteration which is not normal...

Then I have spent some tiem studying the DevTools of that website and I have discover that when the page is loaded, a XHR POST request method is done automatically with the URL : https://casinoplacard.com/wp-admin/admin-ajax.php

And by looking into that request I have found the 182 reviews data in :

Preview >> Data >> Reviews

So could you guys please help me understand how it works to catch those data ?

Thank you very much !

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
普通网友 2018-07-12 12:56
关注
I finally found how to do so, I am sure this is not the best way but at least I did what I wanted to do.

So as I said in my question in the preview tab there were all the data I needed. So what I had to do was getting those data. To do so I understood that when the URL is loaded that XHR POST request were made automatically so I just tried to force python to request that URL.

import requests s = requests.Session() # We get the URL into that session s.get(url) #Here is the imitation of the POST request self.r = s.post(ajax_URL,data=param,headers=headers)`

The parameters you just get them from the headers tab of the DevTool, then the form data is your parameters. For the header you get it also in the header tab, you search for User-Agent and just paste all that in the headers. The ajax URL is the one I wrote in my question.

Hope that will help someone.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

如何使用带有Scrapy的admin-ajax.php从网站上抓取数据 ajax php python
2018-07-11 12:56

回答 1 已采纳 I finally found how to do so, I am sure this is not the best way but at least I did what I wanted
爬虫 scrapy 导出json文件时，怎么让不同类之间空一行 python 有问必答
2021-06-26 13:47

回答 1 已采纳这是一个JSON数组，JSON里面不能空行，否则转换可能会出问题。
SCRAPY运行报错， [scrapy.core.engine] INFO: Spider closed (finished)！ python
2021-07-26 15:56

回答 2 已采纳这个就是正常爬完了的日志信息吧，没啥问题啊
python爬取js里的数据_python – Scrapy,在Javascript中抓取数据
2020-12-01 00:48

weixin_39926613的博客 (我把它发布到了scrapy-users邮件列表,但保罗建议我在这里发布它,因为它补充了shell命令交互的答案.)通常,使用第三方服务呈现某些数据可视化(地图,表格等)的网站必须以某种方式发送数据,并且在大多数情况下,可以从...
在以瀑布流方式翻页的网站,使用scrapy网络爬虫,但是只爬取了第一页数据,没有爬取第二页. python 爬虫
2021-09-05 19:18

回答 2 已采纳那叫ajax，
用scrapy.Request怎么抓取JS动态页面 python
2022-01-03 10:32

回答 1 已采纳 self.xidian_next_page确定有值吗
scrapy框架+formdata+ajax爬取及翻页问题 python 数据挖掘测试用例
2020-03-25 14:18

回答 1 已采纳 def parse(self, response): result = eval(response.body.decode('utf-8')) 兄弟，你打印一下resu
python就业班-淘宝-目录.txt
2018-12-23 17:59

weixin_30305735的博客 │ │ 01-udp发送数据demo-1.flv │ │ 02-udp发送数据-demo-2.flv │ │ 03-关闭VMware的虚拟网卡（了解）.flv │ │ 04-udp发送数据的强调.flv │ │ 05-（重点）运行python程序以及python交互模式，encode编码，...
【scrapy爬虫问题】scrapy.xpath解析返回的结果有问题，求解答！！！ python
2020-07-13 19:48

回答 1 已采纳先打印response看看和你在网页上看到的一样吗？？
scrapy存到mysql查询无数据 mysql python 数据挖掘测试用例
2020-03-04 16:49

回答 1 已采纳在pipelines文件的内部，打印item，看看数据到底有没有获取到连接数据库成功后，打印一个数据库内部的数据，看看是不是连接成功大概率你没搞到数据，所以什么也没有写入
从url获取到了response却无法获取到response中的具体内容，response.text显示{code:0,msg:'limited'} python webview
2022-01-10 16:51

回答 1 已采纳加个请求头 import requests url = 'https://item-soa.jd.com/getWareBusiness?skuId=100012809042&cat=737,794
scrapy爬虫框架
2022-12-07 19:56

xuxiaoxu1的博客 Scrapy一个开源和协作的框架，其最初是为了页面抓取 (更确切来说, 网络抓取 )所设计的，使用它可以以快速、简单、可扩展的方式从网站中提取所需的数据。但目前Scrapy的用途十分广泛，可用于如数据挖掘、监测和自动化...
爬虫scrapy框架爬不出来，但是request可以出来 http python 爬虫
2022-05-06 00:26

回答 2 已采纳你应该继承 scrapy.SpiderCrawlSpider 不要自定义 parse 函数。
Scrapy轻松定制网络爬虫
2019-09-22 19:08

DELL851314的博客网络爬虫（Web Crawler, Spider）就是一个在网络上乱爬的机器人。当然它通常并不是一个实体的机器人，因为网络本身也是虚拟的东西，所以这个“机器人”其实也就是一段程序，并且它也不是乱爬，而是有一定目的的，...
Scrapy1.4爬取笑话网站数据，Python3.5+Django2.0构建笑话应用
2018-01-08 11:46

weixin_30608503的博客 1、抓取http://www.jokeji.cn网站的笑话 2、以瀑布流方式显示 Part2：安装爬虫框架Scrapy1.4 1、安装Scrapy1.4 E:\django\myProject001>pip install scrapy 执行报错： error: Unable to find ...
没有解决我的问题, 去提问

悬赏问题

¥15 深度学习根据CNN网络模型，搭建BP模型并训练MNIST数据集
¥15 lammps拉伸应力应变曲线分析
¥15 C++ 头文件/宏冲突问题解决
¥15 用comsol模拟大气湍流通过底部加热（温度不同）的腔体
¥50 安卓adb backup备份子用户应用数据失败
¥20 有人能用聚类分析帮我分析一下文本内容嘛
¥15 请问Lammps做复合材料拉伸模拟，应力应变曲线问题
¥30 python代码，帮调试，帮帮忙吧
¥15 #MATLAB仿真#车辆换道路径规划
¥15 java 操作 elasticsearch 8.1 实现索引的重建

如何使用带有Scrapy的admin-ajax.php从网站上抓取数据

1条回答 默认 最新

悬赏问题

1条回答默认最新