苦蓝 2023-07-23 14:43 采纳率: 60%
浏览 79
已结题

【Python|爬虫】如何爬取下一页

小说只爬取了59章还有其他目录页没有爬取,
目录页总共11页只爬取了1页

```python
import requests
from lxml import etree
 
url='https://www.qb5200.la/book/116524/'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'}
 
res=requests.get(url,headers=headers)
html=etree.HTML(res.text)
chapter_name=html.xpath("//*/dl[@class='zjlist']/dd//text()")
href=html.xpath("//*/dl[@class='zjlist']/dd/a/@href")
base_url="https://www.qb5200.la/book/116524/"
for i in range(len(chapter_name)):
    
    data=requests.get(base_url+href[i],headers=headers)
    html=etree.HTML(data.text)
    content=html.xpath("//*/div[@id='content']//text()")
    with open(f'e:/123/{chapter_name[i]}.txt', 'w',encoding="utf-8") as f:
        for d in content:
            f.write(d.replace("\xa0\xa0\xa0\xa0",'\n'))
 

```

  • 写回答

3条回答 默认 最新

  • cjh4312 2023-07-23 14:51
    关注

    很简单加个循环就行了

    
    import requests
    from lxml import etree
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'}
    
    for n in range(1,12): 
        url=f'https://www.qb5200.la/book/116524/index_{n}.html'
        res=requests.get(url,headers=headers)
        html=etree.HTML(res.text)
        chapter_name=html.xpath("//*/dl[@class='zjlist']/dd//text()")
        href=html.xpath("//*/dl[@class='zjlist']/dd/a/@href")
        base_url="https://www.qb5200.la/book/116524/"
        for i in range(len(chapter_name)):
            data=requests.get(base_url+href[i],headers=headers)
            html=etree.HTML(data.text)
            content=html.xpath("//*/div[@id='content']//text()")
            with open(f'e:/123/{chapter_name[i]}.txt', 'w',encoding="utf-8") as f:
                for d in content:
                    f.write(d.replace("\xa0\xa0\xa0\xa0",'\n'))
            print(f'"{chapter_name[i]}" 保存完毕')
            f.close()
    

    img

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论 编辑记录
查看更多回答(2条)

报告相同问题?

问题事件

  • 系统已结题 7月31日
  • 已采纳回答 7月23日
  • 创建了问题 7月23日