O7O7FM 2021-12-14 01:41 采纳率: 100%
浏览 41
已结题

python爬取文章不能按顺序循环

爬取写下的文件全是同一个内容,
以下为代码:

#coding=GB18030
import urllib.request
from bs4 import BeautifulSoup
import re
import os

urls = ["https://www.bilibili.com/read/cv13853928?from=category_0","https://www.bilibili.com/read/cv13900955?from=category_0","https://www.bilibili.com/read/cv14392664?from=category_0","https://www.bilibili.com/read/cv14290608?from=category_0","https://www.bilibili.com/read/cv14269554?from=category_0","https://www.bilibili.com/read/cv14023818?from=category_0","https://www.bilibili.com/read/cv14367119?from=category_0","https://www.bilibili.com/read/cv14310331?from=category_0","https://www.bilibili.com/read/cv14312166?from=category_0","https://www.bilibili.com/read/cv14395382?from=category_0","https://www.bilibili.com/read/cv14340236?from=category_0","https://www.bilibili.com/read/cv14312107?from=category_0","https://www.bilibili.com/read/cv14381493?from=category_0","https://www.bilibili.com/read/cv14312157?from=category_0","https://www.bilibili.com/read/cv14342795?from=category_0","https://www.bilibili.com/read/cv14319354?from=category_0","https://www.bilibili.com/read/cv14381629?from=category_0","https://www.bilibili.com/read/cv14353230?from=category_0","https://www.bilibili.com/read/cv14309947?from=category_0","https://www.bilibili.com/read/cv14369822?from=category_0","https://www.bilibili.com/read/cv14394980?from=category_0","https://www.bilibili.com/read/cv14337802?from=category_0","https://www.bilibili.com/read/cv14365402?from=category_0","https://www.bilibili.com/read/cv14361551?from=category_0","https://www.bilibili.com/read/cv14346357?from=category_0","https://www.bilibili.com/read/cv14398923?from=category_0","https://www.bilibili.com/read/cv14314809?from=category_0","https://www.bilibili.com/read/cv14315884?from=category_0","https://www.bilibili.com/read/cv14361893?from=category_0","https://www.bilibili.com/read/cv14395601?from=category_0","https://www.bilibili.com/read/cv14326983?from=category_0","https://www.bilibili.com/read/cv14324884?from=category_0","https://www.bilibili.com/read/cv14327098?from=category_0","https://www.bilibili.com/read/cv14371294?from=category_0","https://www.bilibili.com/read/cv14350914?from=category_0","https://www.bilibili.com/read/cv14354339?from=category_0"]

def text_create(name, msg):
    desktop_path = "C:\\txt\\"  
    full_path = desktop_path + name  
    file = open(full_path, 'w',encoding="utf-8")
    file.write(msg)  
    # file.close()
    
filePrefix = 'text'   #文件前缀
fileSuffix = '.txt'    #文件后缀
fileNum = 31          #文件个数
 
for i in range(1, fileNum):
    fileName = filePrefix + str(i) + fileSuffix
    for i in range(1,fileNum):
        i=i+1
        url=urls[i]
        a=urllib.request.urlopen(url)
        htmlstr=a.read().decode('UTF-8')
        soup=BeautifulSoup(htmlstr,'html.parser')
        y=re.compile(r'<p>([\s\S]*?)</p>')
        text=y.findall(str(soup))      
        x=''
        for i in range(0,len(text)):
            x=x+text[i]
            text1=re.sub("</?\w+[^>]*>",'',x) 
            text2=text1.replace("。",'。\n\n\0\0') 
            text_create(fileName, text2)
            




  • 写回答

2条回答 默认 最新

  • CSDN专家-HGJ 2021-12-14 01:48
    关注

    这样改一下即可:

    def text_create(name, msg):
        desktop_path = "F:\\txt\\"
        full_path = desktop_path + name
        file = open(full_path, 'w', encoding="utf-8")
        file.write(msg)
    
    
        # file.close()
    filePrefix = 'text'  # 文件前缀
    fileSuffix = '.txt'  # 文件后缀
    fileNum = 31  # 文件个数
    for i in range(fileNum):  
        fileName = filePrefix + str(i) + fileSuffix
        url = urls[i]
        a = urllib.request.urlopen(url)
        htmlstr = a.read().decode('UTF-8')
        soup = BeautifulSoup(htmlstr, 'html.parser')
        y = re.compile(r'<p>([\s\S]*?)</p>')
        text = y.findall(str(soup))
        print(text)
        x = ''
        for j in range(0, len(text)):
            x = x+text[j]
            text1 = re.sub("</?\w+[^>]*>", '', x)
            text2 = text1.replace("。", '。\n\n\0\0')
            text_create(fileName, text2)
    
    
    

    如有帮助,请点击采纳按钮。

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论 编辑记录
查看更多回答(1条)

报告相同问题?

问题事件

  • 系统已结题 12月22日
  • 已采纳回答 12月14日
  • 创建了问题 12月14日

悬赏问题

  • ¥15 使用ue5插件narrative时如何切换关卡也保存叙事任务记录
  • ¥20 软件测试决策法疑问求解答
  • ¥15 win11 23H2删除推荐的项目,支持注册表等
  • ¥15 matlab 用yalmip搭建模型,cplex求解,线性化处理的方法
  • ¥15 qt6.6.3 基于百度云的语音识别 不会改
  • ¥15 关于#目标检测#的问题:大概就是类似后台自动检测某下架商品的库存,在他监测到该商品上架并且可以购买的瞬间点击立即购买下单
  • ¥15 神经网络怎么把隐含层变量融合到损失函数中?
  • ¥15 lingo18勾选global solver求解使用的算法
  • ¥15 全部备份安卓app数据包括密码,可以复制到另一手机上运行
  • ¥20 测距传感器数据手册i2c