m0_56302292 2023-12-08 13:12 采纳率: 76.5%
浏览 6

爬取小说章节内容,没有内容

爬取小说内容,url可能写错了,应该怎么改


url='https://www.qidian.com/book/1031940621/'
response=requests.get(url)
html=response.text
soup=BeautifulSoup(html,"html.parser")
chapter_list=soup.find_all("li",class_="clearfix")
chapter_links=[]

for chapter in chapter_list:
    chapter_link=chapter.a.get("href")
    chapter_links.append(chapter_link)

novel_content=""

for chapter_link in chapter_links:
    chapter_reponse=requests.get(chapter_link)
    chapter_html=chapter_reponse.text
    chapter_soup=BeautifulSoup(chapter_html,"html.parser")
    chapter_title=chapter_soup.find("h3").text
    chapter_content=chapter_soup.find("div",class_="read-content").text
    novel_content+=chapter_title+"\n"+chapter_content+"\n"
    
with open("novel.txt","w",encoding="utf-8")as file:
    file.write(novel_content)
  • 写回答

4条回答 默认 最新

  • Leodong. 2023-12-08 13:43
    关注

    该回答通过自己思路及引用到GPTᴼᴾᴱᴺᴬᴵ搜索并已验证,得到内容具体如下:
    根据您提供的代码,我发现可能的问题是章节链接的格式不正确。在爬取小说章节内容时,需要确保链接是正确的。以下是修改后的代码:

    import requests
    from bs4 import BeautifulSoup
    
    url = 'https://www.qidian.com/book/1031940621/'
    response = requests.get(url)
    html = response.text
    soup = BeautifulSoup(html, "html.parser")
    chapter_list = soup.find_all("li", class_="clearfix")
    chapter_links = []
    
    for chapter in chapter_list:
        chapter_link = chapter.a.get("href")
        if chapter_link.startswith('/'):
            chapter_link = url + chapter_link
        chapter_links.append(chapter_link)
    
    novel_content = ""
    
    for chapter_link in chapter_links:
        chapter_reponse = requests.get(chapter_link)
        chapter_html = chapter_reponse.text
        chapter_soup = BeautifulSoup(chapter_html, "html.parser")
        chapter_title = chapter_soup.find("h3").text
        chapter_content = chapter_soup.find("div", class_="read-content").text
        novel_content += chapter_title + "\n" + chapter_content + "\n"
    
    with open("novel.txt", "w", encoding="utf-8") as file:
        file.write(novel_content)
    

    这段代码首先检查章节链接是否以斜杠(/)开头,如果是,则将其与基本URL拼接起来。这样可以确保获取到正确的章节链接。


    如果以上回答对您有所帮助,点击一下采纳该答案~谢谢

    评论

报告相同问题?

问题事件

  • 创建了问题 12月8日