dongyan8896 2018-07-11 12:56
浏览 127
已采纳

如何使用带有Scrapy的admin-ajax.php从网站上抓取数据

I am trying to scrape the reviews about unibet casino on that website : https://casinoplacard.com/unibet-casino-reviews-and-bonuses/

As I did for other sources of reviews I used Scrapy on Python to scrape the reviews with the code below :

class slotRunner_spyder(scrapy.Spider):
count=0

name = "slotRunner_spyder"
start_urls = [

       'https://casinoplacard.com/unibet-casino-reviews-and-bonuses/'
]
def parse(self, response):

    parsed_uri = urlparse(response.url)
    domain = '{uri.scheme}://{uri.netloc}/'.format(uri=parsed_uri)

    for review in response.css('div.rwp-users-reviews > div.rwp-u-review') :
        self.count+=1
        yield {
            'name': review.css('td a::text').extract_first(),
            'date': review.css('td small::text').extract_first(),
            'review': review.css('div.rwp-u-review__content > div.rwp-u-review__comment').extract(),
            'url' : response.url
        }
    print(self.count)

But for that website it does not work. To understand better I have introduced the counter (self.count) and discover that it do only 1 iteration which is not normal...

Then I have spent some tiem studying the DevTools of that website and I have discover that when the page is loaded, a XHR POST request method is done automatically with the URL : https://casinoplacard.com/wp-admin/admin-ajax.php

And by looking into that request I have found the 182 reviews data in :

Preview >> Data >> Reviews

So could you guys please help me understand how it works to catch those data ?

Thank you very much !

  • 写回答

1条回答 默认 最新

  • 普通网友 2018-07-12 12:56
    关注

    I finally found how to do so, I am sure this is not the best way but at least I did what I wanted to do.

    So as I said in my question in the preview tab there were all the data I needed. So what I had to do was getting those data. To do so I understood that when the URL is loaded that XHR POST request were made automatically so I just tried to force python to request that URL.

    import requests
    s = requests.Session()
    # We get the URL into that session
    s.get(url)
    #Here is the imitation of the POST request 
    self.r = s.post(ajax_URL,data=param,headers=headers)`
    

    The parameters you just get them from the headers tab of the DevTool, then the form data is your parameters. For the header you get it also in the header tab, you search for User-Agent and just paste all that in the headers. The ajax URL is the one I wrote in my question.

    Hope that will help someone.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

    报告相同问题?

    悬赏问题

    • ¥15 Google Chrome 所有页面崩溃,三种解决方案都没有解决,我崩溃了
    • ¥18 如何用c++编写数学规律题
    • ¥20 使用uni-app发起网络请求,获取重定向302返回的cookie
    • ¥20 手机外部浏览器拉起微信小程序支付 (相关搜索:微信小程序)
    • ¥20 怎样通过一个网址找到其他同样模版的网址
    • ¥30 XIAO esp32c3 读取FDC2214的数据
    • ¥15 在工控机(Ubuntu系统)上外接USB蓝牙硬件进行蓝牙通信
    • ¥15 关于PROCEDURE和FUNCTION的问题
    • ¥100 webapi的部署(标签-服务器)
    • ¥20 怎么加快手机软件内部计时的时间(关键词-日期时间)