萌新刚刚开始接触爬虫,源网页是个小说网页,应该是没加反爬的机制的https://www.luoqiuzw.com/book/4841/74745706.html
测试了下,只爬单页的话完全没问题,问题点在于构建下一页地址和翻页条件都正确的情况下,无法执行翻页,有没有大佬帮忙看看是什么原因,下面就是主文件的代码
import scrapy
from luoqiu.items import LuoqiuItem
import re
class LuoqiuSpider(scrapy.Spider):
name = 'luoqiu'
allowed_domains = ['luoqiu.com']
start_urls = ['https://www.luoqiuzw.com/book/4841/74745706.html']
def parse(self, response):
item=LuoqiuItem()
item['title']=response.xpath('//*[@id="main"]/div/div/div[2]/h1/text()').extract_first() #获取标题
print(item['title'])#打印标题
item['zhengwen']=response.xpath('//*[@id="content"]/p[2]/text()').extract()#测试,只获取正文1行
print(item['zhengwen'])
next_url='https://www.luoqiuzw.com'+response.xpath('//*[@id="main"]/div/div/div[2]/div[1]/a[4]/@href').extract_first()#构建下一页
print(next_url) #https://www.luoqiuzw.com/book/4841/74745707.html
next_page=re.search(r'\d{8}',next_url).group() #提取下一页的页码7474707 str
print(next_page,type(next_page))#提取下一页的页码7474707 str
if int(next_page) <=74745720:#判断翻页,这边测试就翻几页
yield scrapy.Request (url=next_url ,callback=self.parse)
上面这个是执行结果