sinat_38677939
sinat_38677939
采纳率0%
2018-12-09 16:20

没有进行筛选,scrapy-Request callback不调用,跪求大神指点!!!

Spider的代码是这样的:

    def parse(self, response):
        url_list = response.xpath('//a/@href').extract()[0]
        for single_url in url_list:
                    url = 'https:' + single_url.xpath('./@href').extract()[0]
            name = single_url.xpath('./text()').extract()[0]
            yield scrapy.Request(url=url, callback=self.parse_get, meta={'url':url, 'name':name})

    def parse_get(self, response):
            print(1)
                item = MySpiderItem()
                item['name'] = response.mate['name']
                item['url'] = response.mate['url']
                yield item                  

middlewares的代码是这样的:

    def process_request(self, request, spider):
        self.driver = webdriver.Chrome()
        self.driver.get(request.url)
        if 'anime' in request.meta:
            element = WebDriverWait(self.driver, 10).until(EC.presence_of_element_located((By.ID, 'header')))
        else:
            element = WebDriverWait(self.driver, 10).until(EC.presence_of_element_located((By.ID, 'header')))
        html = self.driver.page_source
        self.driver.quit()

        return scrapy.http.HtmlResponse(url=request.url, body=html, request=request, encoding='utf-8')

我是用Chrome来运行的,Request里面的url是一个一个地打开了,但是一直没有调用parse_get。一直都没有加allowed_domains,也尝试过在Request中加dont_filter=True,但是网站能打开,证明应该不是网站被过滤了的问题。实在是没有想法了,求大神指导!!!!

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

相关推荐