O7O7FM 2021-12-14 01:41 采纳率: 100%
浏览 41
已结题

python爬取文章不能按顺序循环

爬取写下的文件全是同一个内容,
以下为代码:

#coding=GB18030
import urllib.request
from bs4 import BeautifulSoup
import re
import os

urls = ["https://www.bilibili.com/read/cv13853928?from=category_0","https://www.bilibili.com/read/cv13900955?from=category_0","https://www.bilibili.com/read/cv14392664?from=category_0","https://www.bilibili.com/read/cv14290608?from=category_0","https://www.bilibili.com/read/cv14269554?from=category_0","https://www.bilibili.com/read/cv14023818?from=category_0","https://www.bilibili.com/read/cv14367119?from=category_0","https://www.bilibili.com/read/cv14310331?from=category_0","https://www.bilibili.com/read/cv14312166?from=category_0","https://www.bilibili.com/read/cv14395382?from=category_0","https://www.bilibili.com/read/cv14340236?from=category_0","https://www.bilibili.com/read/cv14312107?from=category_0","https://www.bilibili.com/read/cv14381493?from=category_0","https://www.bilibili.com/read/cv14312157?from=category_0","https://www.bilibili.com/read/cv14342795?from=category_0","https://www.bilibili.com/read/cv14319354?from=category_0","https://www.bilibili.com/read/cv14381629?from=category_0","https://www.bilibili.com/read/cv14353230?from=category_0","https://www.bilibili.com/read/cv14309947?from=category_0","https://www.bilibili.com/read/cv14369822?from=category_0","https://www.bilibili.com/read/cv14394980?from=category_0","https://www.bilibili.com/read/cv14337802?from=category_0","https://www.bilibili.com/read/cv14365402?from=category_0","https://www.bilibili.com/read/cv14361551?from=category_0","https://www.bilibili.com/read/cv14346357?from=category_0","https://www.bilibili.com/read/cv14398923?from=category_0","https://www.bilibili.com/read/cv14314809?from=category_0","https://www.bilibili.com/read/cv14315884?from=category_0","https://www.bilibili.com/read/cv14361893?from=category_0","https://www.bilibili.com/read/cv14395601?from=category_0","https://www.bilibili.com/read/cv14326983?from=category_0","https://www.bilibili.com/read/cv14324884?from=category_0","https://www.bilibili.com/read/cv14327098?from=category_0","https://www.bilibili.com/read/cv14371294?from=category_0","https://www.bilibili.com/read/cv14350914?from=category_0","https://www.bilibili.com/read/cv14354339?from=category_0"]

def text_create(name, msg):
    desktop_path = "C:\\txt\\"  
    full_path = desktop_path + name  
    file = open(full_path, 'w',encoding="utf-8")
    file.write(msg)  
    # file.close()
    
filePrefix = 'text'   #文件前缀
fileSuffix = '.txt'    #文件后缀
fileNum = 31          #文件个数
 
for i in range(1, fileNum):
    fileName = filePrefix + str(i) + fileSuffix
    for i in range(1,fileNum):
        i=i+1
        url=urls[i]
        a=urllib.request.urlopen(url)
        htmlstr=a.read().decode('UTF-8')
        soup=BeautifulSoup(htmlstr,'html.parser')
        y=re.compile(r'<p>([\s\S]*?)</p>')
        text=y.findall(str(soup))      
        x=''
        for i in range(0,len(text)):
            x=x+text[i]
            text1=re.sub("</?\w+[^>]*>",'',x) 
            text2=text1.replace("。",'。\n\n\0\0') 
            text_create(fileName, text2)
            




  • 写回答

2条回答 默认 最新

  • CSDN专家-HGJ 2021-12-14 01:48
    关注

    这样改一下即可:

    def text_create(name, msg):
        desktop_path = "F:\\txt\\"
        full_path = desktop_path + name
        file = open(full_path, 'w', encoding="utf-8")
        file.write(msg)
    
    
        # file.close()
    filePrefix = 'text'  # 文件前缀
    fileSuffix = '.txt'  # 文件后缀
    fileNum = 31  # 文件个数
    for i in range(fileNum):  
        fileName = filePrefix + str(i) + fileSuffix
        url = urls[i]
        a = urllib.request.urlopen(url)
        htmlstr = a.read().decode('UTF-8')
        soup = BeautifulSoup(htmlstr, 'html.parser')
        y = re.compile(r'<p>([\s\S]*?)</p>')
        text = y.findall(str(soup))
        print(text)
        x = ''
        for j in range(0, len(text)):
            x = x+text[j]
            text1 = re.sub("</?\w+[^>]*>", '', x)
            text2 = text1.replace("。", '。\n\n\0\0')
            text_create(fileName, text2)
    
    
    

    如有帮助,请点击采纳按钮。

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论 编辑记录
查看更多回答(1条)

报告相同问题?

问题事件

  • 系统已结题 12月22日
  • 已采纳回答 12月14日
  • 创建了问题 12月14日

悬赏问题

  • ¥15 为什么我运行这个网络会出现以下报错?CRNN神经网络
  • ¥20 steam下载游戏占用内存
  • ¥15 树莓派5怎么用camera module 3啊
  • ¥20 java在应用程序里获取不到扬声器设备
  • ¥15 echarts动画效果的问题,请帮我添加一个动画。不要机器人回答。
  • ¥15 Attention is all you need 的代码运行
  • ¥15 一个服务器已经有一个系统了如果用usb再装一个系统,原来的系统会被覆盖掉吗
  • ¥15 使用esm_msa1_t12_100M_UR50S蛋白质语言模型进行零样本预测时,终端显示出了sequence handled的进度条,但是并不出结果就自动终止回到命令提示行了是怎么回事:
  • ¥15 前置放大电路与功率放大电路相连放大倍数出现问题
  • ¥80 部署运行web自动化项目