解决python爬虫时遇到AttributeError的问题

在爬虫时出现attributeerror 在尝试过多次方法也未能得到解决希望各位能帮忙看下代码
代码如下：


```python

import requests
from bs4 import BeautifulSoup
import urllib.request
import xml.etree.ElementTree as ET
import time
import urllib.parse
import socket

User_Agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36 Edg/107.0.1418.56'
headers = {'User-Agent': User_Agent}

# 得到所有新闻rul链接
def get_url_list_025(url):
    init_page = requests.get(url, headers=headers).content
    re = urllib.request.Request(url, headers=headers)
    try:
        response = urllib.request.urlopen(re, timeout=10)
        html = response.read().decode("utf-8")  # 获得html文档
    except:
        print('爬取失败:', url)
    news_list = []
    root = 'https://china.huanqiu.com'
    f1 = open('国内要闻_国内新闻_环球网.html', 'r', encoding='utf-8')   # 从文件中读取HTML文档
    soup = BeautifulSoup(f1, "html.parser")  # 解析HTML文档
    f1.close()
    news_list1 = soup.find('div', class_='m-recommend-con')   # 先找到大的div标签
    items = news_list1.find_all('li')   # 找到所有‘li’标签
    for i, item in enumerate(items):
        if len(item) == 0:
            continue
        a = item.find('a')  # 找到链接标签
        title = a.find('h4').string     # 获得新闻标题
        url = a.get('href')    # 获得新闻链接
        if root in url:
            url = url[len(root):]
        date_time = a.find('span', class_="time").string  # 获得新闻时间
        date_time = str(date_time) + ':00'
        new_txt = [date_time, root+url, title]   # 组成一个列表形式
        news_list.append(new_txt)   # 添加到列表
    return news_list

def  crwal_url_list_025(url_list):
    doc_dir_path ='data/news/'            # 文档存储路径
    doc_encoding ='utf-8'                    # 编码格式
    n = 1
    for i, news in enumerate(url_list):
        print('爬取数量：%d/%d' % (i, len(url_list)))    # 打印爬取进度
        re = urllib.request.Request(news[1], headers=headers)
        try:
            response = urllib.request.urlopen(re, timeout=10)
            html = response.read()  # 获得html文档
        except socket.timeout as err: # 超时异常
            print('超时')
            print(err)
            print('休息10s')
            time.sleep(60)
            continue    # 继续
        except Exception as C:  # 其他类异常
            print('<%s, %s, %s>' % (type(C), C.reason, news[1]))
            print('休息5s')
            time.sleep(5)
            continue
        soup = BeautifulSoup(html, "html.parser")   # 解析HTML文档
        # print(soup.prettify())
        for each in soup('script'):     # 移除soup中script标签
            each.extract()
        try:
            ps = soup.find('div', class_='l-con clear').find('article').find_all('p')   # # 找到所有正文部分中p标签
        except Exception as C:
            print('%s, %s' % (type(C), news[1]))
            continue
        txt = ''
        for each in ps: # 每次读取一个P段落
            p = each.get_text().strip()     #去除首尾的空格
            if p == '': # 跳过空段落（因为正文是段与段之间是空段落隔开的）
                continue
            txt += '\t' + p + '\n'  # 格式控制（每个p段落后有换行）
        ze = soup.find('div', class_='l-sign').find('p', class_="edit-peo").get_text()  # 获取责任编辑人
        txt += ze
        txt = txt.replace(' ', '')  # 空格用逗号隔开

        if len(txt) < 150:    # 去掉新闻词数小于150的
            continue
        doc = ET.Element("doc")
        ET.SubElement(doc, 'id').text = "%d" % n    # 添加id号
        ET.SubElement(doc, "url").text = news[1]     # 添加url链接
        ET.SubElement(doc, "title").text = news[2]  # 添加标题
        ET.SubElement(doc, "datetime").text = news[0]  # 添加新闻时间
        ET.SubElement(doc, "body").text = txt       # 添加正文部分
        tree = ET.ElementTree(doc)  # 生成ElementTree对象
        tree.write((doc_dir_path + "%d.xml" % n), encoding=doc_encoding, xml_declaration=True) # 写入文件
        n += 1
        if n % 1000 == 0:       # 爬取了1000个新闻后睡眠20秒
            print("休息10秒")
            time.sleep(10)

if __name__ == "__main__":
    # url = 'https://china.huanqiu.com/focus'
    url = 'https://china.huanqiu.com'
    url_list = get_url_list_025(url)
    print('爬取%d个新闻' % len(url_list))
    crwal_url_list_025(url_list)

![img](https://img-mid.csdnimg.cn/release/static/image/mid/ask/612380972966162.png "#left")

![img](https://img-mid.csdnimg.cn/release/static/image/mid/ask/507080972966150.png "#left")

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
CSDN-Ada助手 CSDN-AI 官方账号 2022-11-24 18:45
关注
看下这篇博客，也许你就懂了，链接：python-AttributeError报错解决办法
解决
无用 1
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

Python学习开始遇到AttributeError问题 python
2021-09-28 09:49

回答 1 已采纳两个解决方法：一、把第二行改成： t = turtle.Turtle() 即可或者：二、第二行删除，7~10行的 t. 改成 turtle.
python问题 AttributeError:xx object has no attribute python
2022-11-09 12:37

回答 4 已采纳初始化函数名字有问题还有其他的若干问题，给你修复了，你着❤️画的不错 import random from math import sin, cos, pi, log from tkinter im
爬虫保存数据时遇到AttributeError: ‘list‘ object has no attribute ‘write‘ python 爬虫
2022-09-28 10:07

回答 2 已采纳哪一行报的错误
爬虫实战遇到的问题及解决汇总 / 爬虫原理介绍
2020-01-26 21:22

Quest_sec的博客如何写一个爬虫程序爬取豆瓣内容？
python爬虫，etree模块有问题 python 爬虫
2022-07-02 00:17

回答 1 已采纳你HtML 中T写成小写了, HTML应该是全大写另外 li 是 Element元素不能与字符串拼接. 需要用li.text获取元素中的文本print(li+'\n')fp.write(li+'\
python爬虫使用selenium切换窗口问题 python selenium 有问必答爬虫
2022-03-18 12:30

回答 2 已采纳 driver.swith_to.window(driver.window_handles[1]),函数名写错了，不是swith是switch，少写了个c，改成：driver.switch_to.win
爬虫遇到AttributeError: 'NoneType' object has no attribute 'children'该如何解决 python 爬虫
2022-11-19 12:53

回答 2 已采纳 request设置参数verify=False r = requests.get(url, timeout=30, verify=False)
Python爬虫学习过程中遇到的AttributeError: 'NoneType' object has no attribute 'children'问题解决
2020-04-09 17:42

supalonely的博客 Python爬虫学习过程中遇到的AttributeError: ‘NoneType’ object has no attribute 'children’问题解决在中国大学MOOC网上学习Python网络爬虫与信息提取时，运行嵩天老师的中国大学排名定向爬虫实例代码A时遇到了...
python爬虫在爬取页面时，修改页面编码出现问题AttributeError: 'NoneType' object has no attribute 'apparent_encoding' python 开发语言有问必答爬虫
2021-11-27 01:37

回答 3 已采纳那是因为gethtml函数中请求超时或其他原因，抛出异常，返回了None 值。在调用时出现None类型无apparent_encoding属性。你可以将timeout设置大些。如解答对你有帮助，请
pycharm运行python文件时出现AttributeError: module 'numpy' has no attribute 'asscalar'的报错怎么解决 batch python
2022-12-30 00:54

回答 3 已采纳望采纳！！点击该回答右侧的“采纳”按钮即可采纳！！这个错误的原因是在使用numpy模块的时候，调用了不存在的方法asscalar。解决办法有两种：1.将代码中调用的asscalar方法替换为正确的方法
爬虫代码提示AttributeError: ‘NoneType’ object has no attribute’strip’，怎么解决 python
2022-05-16 17:01

回答 2 已采纳但凡出现NoneType，表示你的变量值是None报错告诉你None没有strip方法那你就去代码里搜哪里用到了strip，然后看看字符串为什么会是None你在判断ts是''之前，不先判断一下td.s
python爬虫出现AttributeError: ‘NoneType‘ object has no attribute ‘text‘错误怎么办？
2021-02-08 16:57

非著名小田田的博客 python爬虫出现AttributeError: ‘NoneType‘ object has no attribute ‘text‘错误项目场景： python爬虫爬取小说（Jack cui网络爬虫教学实例）问题描述：遇到的问题：代码编译后出现AttributeError: ‘None...
爬虫AttributeError: 'NoneType' object has no attribute 'find' 的问题 python
2022-08-02 10:22

回答 5 已采纳你这个代码是在循环中多次执行的你不是每次循环 son_page中都有figure标签,第一次循环 son_page中就没有figure标签, son_tu = son_page.find("figur
python 爬虫代码
2023-03-09 21:49

qq_繁华的博客 python 爬虫代码
Python网络爬虫使用教程
2023-06-13 16:50

TTTALK的博客 python爬虫资源抓取--urllib/requests/requests-html、正则表达式、数据解析-Beautiful Soup/lxml/selectolax、自动化爬虫--selenium、爬虫框架--Scrapy/pyspider、模拟登录与验证码识别、autoscraper
没有解决我的问题, 去提问

问题事件

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
创建了问题 11月24日

悬赏问题

¥30 这是哪个作者做的宝宝起名网站
¥60 版本过低apk如何修改可以兼容新的安卓系统
¥25 由IPR导致的DRIVER_POWER_STATE_FAILURE蓝屏
¥50 有数据，怎么建立模型求影响全要素生产率的因素
¥50 有数据，怎么用matlab求全要素生产率
¥15 TI的insta-spin例程
¥15 完成下列问题完成下列问题
¥15 C#算法问题, 不知道怎么处理这个数据的转换
¥15 YoloV5 第三方库的版本对照问题
¥15 请完成下列相关问题！

解决python爬虫时遇到AttributeError的问题

1条回答 默认 最新

问题事件

悬赏问题

1条回答默认最新