m0_56302292 2023-12-08 14:05 采纳率: 76.5%
浏览 4

爬取小说章节内容 出错

为什么没有内容


import requests
from bs4 import BeautifulSoup
 
url = 'https://www.qidian.com/book/1031940621/'
response = requests.get(url)
html = response.text
soup = BeautifulSoup(html, "html.parser")
chapter_list = soup.find_all("li", class_="clearfix")
chapter_links = []
 
for chapter in chapter_list:
    chapter_link = chapter.a.get("href")
    if chapter_link.startswith('/'):
        chapter_link = url + chapter_link
    chapter_links.append(chapter_link)
novel_content = ""
 
for chapter_link in chapter_links:
    chapter_reponse = requests.get(chapter_link)
    chapter_html = chapter_reponse.text
    chapter_soup = BeautifulSoup(chapter_html, "html.parser")
    chapter_title = chapter_soup.find("h3").text
    chapter_content = chapter_soup.find("div", class_="read-content").text
    novel_content += chapter_title + "\n" + chapter_content + "\n"
    
with open("novel.txt", "w", encoding="utf-8") as file:
    file.write(novel_content)
  • 写回答

3条回答 默认 最新

  • 虫虫仙人 2023-12-08 14:38
    关注

    兄弟,你多打印几次就知道了
    chapter_list值为空
    应该是你写的路径错了
    你重写xpath检查看看

    评论

报告相同问题?

问题事件

  • 创建了问题 12月8日