helltaker_- 2021-05-30 12:48 采纳率: 25%
浏览 293
已采纳

爬虫爬到的text正常输出,但写入文件时是空的

import requests
from lxml import etree
import json
# for i in range(1, 4):
#     res = requests.get(f'https://www.51shucheng.net/kehuan/santi/santi{i}')
#     res.encoding = 'utf-8'
#     html = etree.HTML(res.text)
#     titles = html.xpath('/html/body/div/div[3]/div[2]/div[6]/ul//li/a/@title')
#     hrefs = html.xpath('/html/body/div/div[3]/div[2]/div[6]/ul//li/a/@href')
#     for href in hrefs:
#         smalltitle = titles[hrefs.index(href)]
#         print(smalltitle, href)
#         response = requests.get(href)
#         response.encoding = 'utf-8'
#         html = etree.HTML(response.text)
#         text = html.xpath('//*[@id="neirong"]//text()')
#         text2 = ''.join(text).replace('(adsbygoogle = window.adsbygoogle || []).push({});','')
#         with open(f'三体/{smalltitle}.txt', 'w',encoding='utf-8')  as file:
#             file.write(text2)

res = requests.get(f'https://www.51shucheng.net/sidamingzhu/hongloumeng')
res.encoding = 'utf-8'
html = etree.HTML(res.text)
titles = html.xpath('/html/body/div/div[3]/div[2]/div[5]/ul//li/a/@title')
hrefs = html.xpath('/html/body/div/div[3]/div[2]/div[5]/ul//li/a/@href')
for href in hrefs:
    smalltitle = titles[hrefs.index(href)]
    print(smalltitle, href)
    response = requests.get(href)
    response.encoding = 'utf-8'
    html = etree.HTML(response.text)
    text = html.xpath('//*[@id="neirong"]//text()')
    text2=''.join(text)
    with open(f'红楼梦/{smalltitle}.txt', 'w',encoding='utf-8') as file:
        file.write(text2)

注释掉的内容爬取的是网站的另一本小说,完全没有问题

下面的是爬取红楼梦

写入文件并没有报错但打开是这个样子

 

 

  • 写回答

1条回答 默认 最新

  • CSDN专家-HGJ 2021-05-30 14:33
    关注

    问题出在with open(f'红楼梦/{smalltitle}.txt', 'w',encoding='utf-8') as file:这行,文件名中含":",不符合系统对文件名的命名规则,导致无法写入文件。改成with open(f'红楼梦/{smalltitle.split(":")[0]}.txt', 'w', encoding='utf-8') as f:即可。

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 关于#java#的问题,请各位专家解答!
  • ¥15 急matlab编程仿真二阶震荡系统
  • ¥20 TEC-9的数据通路实验
  • ¥15 ue5 .3之前好好的现在只要是激活关卡就会崩溃
  • ¥50 MATLAB实现圆柱体容器内球形颗粒堆积
  • ¥15 python如何将动态的多个子列表,拼接后进行集合的交集
  • ¥20 vitis-ai量化基于pytorch框架下的yolov5模型
  • ¥15 如何实现H5在QQ平台上的二次分享卡片效果?
  • ¥30 求解达问题(有红包)
  • ¥15 请解包一个pak文件