如题对于一个JS动态页面,如何用scrapy.Request函数调用下一个页面。原因是需要用parse函数抓取每个页面的链接,并进行页面的跳转。
def parse(self, response):
item=JobHuntingItem()
next_page_href = response.css('li[class="next"]>a::attr(href)').extract()
last_page_href = response.css('li[class="last"]>a::attr(href)').extract()
if next_page_href != last_page_href:
self.xidian_next_page = 'https://job.xidian.edu.cn' + next_page_href[0]
else:
self.xidian_next_page = ''
c_page_url_list = response.css('ul[class="infoList"]>li:nth-child(1)>a')
for job in c_page_url_list:
driver = JobHuntingDownloaderMiddleware.get_XIDIAN_driver()
driver.get('https://job.xidian.edu.cn' + job.css('a::attr(href)').extract()[0])
time.sleep(4)
item['job_title'] = [driver.find_element('css selector', 'div[class="info-left"]>div>h5').text]
date_text = driver.find_element('css selector', 'div[class="share"]>ul>li:nth-child(1)').text
date_text = date_text[date_text.find(':') + 1:]
if datetime.strptime(date_text,'%Y-%m-%d %H:%M')<datetime.strptime('2021-12-03 00:00','%Y-%m-%d '):
self.xidian_next_page = ''
break
item['job_date'] = [date_text]
views_text = driver.find_element('css selector', 'div[class="share"]>ul>li:nth-child(2)').text
item['job_views'] = [views_text[views_text.find(':') + 1:]]
yield item
if self.xidian_next_page != '':
yield scrapy.Request(self.xidian_next_page, callback=self.parse)
在爬取完一个页面后在下一个循环仍然爬取当前页面,没有进入下一页