在使用asyncio异步编程httpx时遇到原因不明的错误导致目标页面代码获取失败

在使用asyncio异步编程httpx时遇到原因不明的错误导致目标页面源代码获取失败，细节如下：

问题代码用于下载wallhaven.cc上的图片，图片的编号被正确地储存在同目录下的file.txt中。程序会读取这个文件，并根据文件所存储的图片编号生成该图片所在页面的网址，获取这个网址的源代码并在源代码中找到图片元素，下载并存储图片。

由于图片本身所在的网址不规律，但图片所在页面的网址是规律的，所以只好先找图片所在页面的网址，再从这个网址中找到图片元素的地址。

测试数据如下：
2e31px
Errortest
k9v3om
3k62g3
8x967o
2k9lqy
j813pm
rrjvyq
7prdye
5gr1w5
qzlwk5
其中对于2e31px这一图片编号，正常应该有一个404状态，可能因为某种原因，这张图片在网站上不再可用。
对于Errortest这一被当成图片编号的测试数据，应当会有index out of range 错误，因为使用这个编号生成的网址所对应的页面是网站的错误提示，不存在图片文件。
对于其他编号，正常情况下程序应当可以通过生成的地址访问一个含有图片的界面，并找到、下载这张图片到与代码相同的目录下。
但是对于如下代码，运行时出现不明原因的错误

import os
import random
import httpx
import asyncio
from lxml import html

from asyncio import Semaphore

semaphore = Semaphore(7)

async def record_error(url, error, error_file_path):
    with open(error_file_path, 'a') as error_file:
        error_file.write(f"{url}\n")
        error_file.write(f"{error}\n\n")

async def get_page(url, session):
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:109.0) Gecko/20100101 Firefox/115.0',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
        'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
        'Connection': 'keep-alive'
    }
    try:
        async with await session.get(url, headers=headers, timeout=60) as res:
            print("DBTAG 5")
            # 取消括号，改为 res.text
            return await res.text
    except Exception as e:
        print(f"Error occurred while getting the page: {e}")
        await record_error(url, str(e), error_file_path)
        return None

async def download(url, session, error_file_path):
    try:
        print(f"Downloading image: {url}...")
        print("Setting headers...")
        headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:109.0) Gecko/20100101 Firefox/115.0',
            'Host': 'w.wallhaven.cc',
            'Accept': 'image/avif,image/webp,*/*',
            'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
            'Referer': 'https://wallhaven.cc/'
        }
        print("session getting...")
        res = await session.get(url, headers=headers, timeout=60)
        assert res.status_code == 200
        res.raise_for_status()
        print(f"Succeeded in downloading {url}")
        return res.text
    except Exception as e:
        fail = os.path.basename(url)
        print(f"Download failed: {e}, {fail}")
        await record_error(url, str(e), error_file_path)
        return None

async def Cycle(session, line, error_file_path):
    try:
        url = f'https://wallhaven.cc/w/{line}'
        print("DBTAG 2")
        html_ = await get_page(url, session)
        print("DBTAG 3")
        print(url)
        print(html_)
        image_urls = html.fromstring(html_).xpath('//img/@src')
        image_elem = await download(image_urls[2], session, error_file_path)
        print("DBTAG 4")
        if image_elem:
            filename = os.path.basename(image_urls[2])
            with open(filename, 'wb') as f:
                f.write(image_elem.content)
    except Exception as e:
        print(f"Error occurred: {e}")
        fail = os.path.basename(url)
        print(f"Download failed: {fail}")
        await record_error(url, str(e), error_file_path)

if __name__ == "__main__":
    current_directory = os.getcwd()
    print("当前工作目录:", current_directory)
    file_path = os.path.join(current_directory, 'file.txt')

    with open('file.txt', 'r', buffering=20971520) as file:
        print("reading the file...")
        lines = file.read().splitlines()

    print("Initializing error saver.")
    error_file_path = os.path.join(current_directory, 'Error_Path.txt')
    with open(error_file_path, 'w') as error_file:
        error_file.write("Error Messages:\n")

    async def main():
        async with httpx.AsyncClient() as session:
            print("DBTAG 1")
            tasks = [Cycle(session, line, error_file_path) for line in lines]
            await asyncio.gather(*tasks)
    asyncio.run(main())
    print('Completed.')

运行输出如下：

当前工作目录: I:\...（此处省略）
reading the file...
Initializing error saver.
DBTAG 1
DBTAG 2
DBTAG 2
DBTAG 2
DBTAG 2
DBTAG 2
DBTAG 2
DBTAG 2
DBTAG 2
DBTAG 2
DBTAG 2
DBTAG 2
Error occurred while getting the page: __aexit__
DBTAG 3
https://wallhaven.cc/w/5gr1w5
None
Error occurred: expected string or bytes-like object
Download failed: 5gr1w5
Error occurred while getting the page: __aexit__
DBTAG 3
https://wallhaven.cc/w/rrjvyq
None
Error occurred: expected string or bytes-like object
Download failed: rrjvyq
Error occurred while getting the page: __aexit__
DBTAG 3
https://wallhaven.cc/w/Errortest
None
Error occurred: expected string or bytes-like object
Download failed: Errortest
Error occurred while getting the page: __aexit__
DBTAG 3
https://wallhaven.cc/w/qzlwk5
None
Error occurred: expected string or bytes-like object
Download failed: qzlwk5
Error occurred while getting the page: __aexit__
DBTAG 3
https://wallhaven.cc/w/2e31px
None
Error occurred: expected string or bytes-like object
Download failed: 2e31px
Error occurred while getting the page: __aexit__
DBTAG 3
https://wallhaven.cc/w/j813pm
None
Error occurred: expected string or bytes-like object
Download failed: j813pm
Error occurred while getting the page: __aexit__
DBTAG 3
https://wallhaven.cc/w/k9v3om
None
Error occurred: expected string or bytes-like object
Download failed: k9v3om
Error occurred while getting the page: __aexit__
DBTAG 3
https://wallhaven.cc/w/3k62g3
None
Error occurred: expected string or bytes-like object
Download failed: 3k62g3
Error occurred while getting the page: __aexit__
DBTAG 3
https://wallhaven.cc/w/8x967o
None
Error occurred: expected string or bytes-like object
Download failed: 8x967o
Error occurred while getting the page: __aexit__
DBTAG 3
https://wallhaven.cc/w/7prdye
None
Error occurred: expected string or bytes-like object
Download failed: 7prdye
Error occurred while getting the page: __aexit__
DBTAG 3
https://wallhaven.cc/w/2k9lqy
None
Error occurred: expected string or bytes-like object
Download failed: 2k9lqy
Completed.

请问是何种原因导致了错误，如何解决？
（本人初中文凭，属于初学者，烦请答主讲得通俗些）

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
CSDN-Ada助手 CSDN-AI 官方账号 2023-08-13 19:02
关注
【相关推荐】

你可以看下这个问题的回答https://ask.csdn.net/questions/677836

如果你已经解决了该问题, 非常希望你能够分享一下解决方案, 写成博客, 将相关链接放在评论区, 以帮助更多的人 ^-^
解决
无用 1
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

python 爬虫显示412,404 pycharm python
2021-08-30 20:13

回答 3 已采纳请求头没有配置好，重点是 headers 参数，看一下吧
request 库改为 httpx的语法问题： python 有问必答
2023-01-19 18:49

回答 2 已采纳可以使用以下代码来替换原来的 requests.post() 方法: import httpx async def get_detail(url_list: list): url = "ht
python FastAPI 做Post接口返回 307 Temporary Redirect android-studio python
2021-11-18 06:47

回答 1 已采纳简单的 ```python @app.post("/refill3") async def refill(title:str = Form(...),text:str = Form(...),pac
aiodown:使用httpx和asyncio的Python 3的完全异步文件下载器
2021-03-14 09:39

使用在Python 3制作asyncio-based文件下载。要求 Python 3.8或更高版本。 httpx 0.14或更高版本。异步文件0.4或更高版本。安装注意：如果python3是“无法识别的命令”，请尝试使用python代替。对于最新的...
Go中的自定义错误处理
2013-03-17 14:48

回答 2 已采纳 I would turn func doSomething() int, ? { ... if somethingBadHappened {
python httpx http 客户端最新代码
2023-04-20 14:06

python httpx http 客户端最新代码python httpx http 客户端最新代码python httpx http 客户端最新代码python httpx http 客户端最新代码python httpx http 客户端最新代码python httpx http 客户端最新代码python ...
Python3.10 异步编程 asyncio request异步爬取
2022-06-11 17:27

O丶ne丨柒夜的博客简而言之,其实就是通过一个线程实现代码块相互切换执行。实现协程有这么几种方法： greenlet 早期模块。 yield 关键字。 asyncio 装饰器 (py3.4) async、await关键字(py3.5)【推荐】...............
httpx+async实现python 发起异步http请求
2023-07-10 17:10

httpx+async实现python 发起异步http请求当前python的request不支持异步调用，需要使用httpx结合调用，目前我是用于模拟测试并发
python教程httpx详解
2023-04-06 10:54

Httpx 是一个 Python 库，它提供了一个现代化的、易于使用的 HTTP 客户端和服务器。Httpx 可以与Python 的异步框架协同工作，并支持 WebSocket 和 HTTP/2。Httpx 具有极佳的性能和安全性，并支持对各种不同的协议、...
Python基于httpx模块实现发送请求
2020-09-16 13:36

主要介绍了Python基于httpx模块实现发送请求,文中通过示例代码介绍的非常详细，对大家的学习或者工作具有一定的参考学习价值,需要的朋友可以参考下
Python通过请求参数逆向获取QQ音乐榜单数据包源代码
2024-05-01 13:57

QQ音乐榜单数据包中的，请求参数sign是一个动态加密的参数，只有逆向它才能正确获取到榜单数据包。安装教程安装 node.js 和 Python pip install httpx[http2] pip install pyexecjs2 使用说明找到 source...
python如何用httpx实现异步爬虫加速采集
2022-08-12 09:39

晴南标书制作的博客当需要爬取的数量增多的时候，使用python的requests库写爬虫会出现时间很长的情况，所以我们今天用httpx库的异步请求来加速爬取，单机爬虫实现分布式爬虫的速度。
aiowebhdfs:python中WebHDFS API的现代异步实现
2021-05-08 15:54

我知道，没有人再使用Hadoop ，但是对于那些使用Hadoop的人来说，这是一个使用httpx库和aiofiles处理来自HDFS的流数据的具有async功能的大型文件的Web请求库特征使用opnieuw库中的retry_async实现重试和超时窗口 ...
python的httpx包实现异步
2022-07-24 23:58

The_theme的博客代码】python的httpx包实现异步。
httpx：适用于Python的下一代HTTP客户端。 :butterfly:
2021-02-03 16:56

HTTPX-用于Python的下一代HTTP客户端。 HTTPX是Python 3的功能齐全的HTTP客户端，它提供同步和异步API，并支持HTTP / 1.1和HTTP / 2。注意： HTTPX应该在beta中......' 或者，使用异步API ... 将或Python 3.8+与pyt
HTTPX是Python3功能齐全的HTTP客户端:butterfly:-python开发
2021-06-18 17:56

HTTPX是Python3功能齐全的HTTP客户端，它提供同步和异步API，并支持HTTP / 1.1和HTTP / 2。 HTTPX - Python 的下一代 HTTP 客户端。 HTTPX 是 Python 3 的全功能 HTTP 客户端，它提供同步和异步 API，并支持 ...
Python并发 & 并行、同步 & 异步、阻塞 & 非阻塞以及代码实现
2023-09-08 17:48

Jenrey的博客线程安全：指某个函数、函数库在多线程环境中被调用时，能够正确地处理多个线程之间的共享变量，使程序功能正确完成。线程不安全：由于线程的执行随时会发生切换，就造成了不可预料的结果，出现线程不安全协程可以在...
python3爬虫中异步协程的用法
2020-09-16 12:58

在本篇文章里小编给大家整理的是关于python3爬虫中异步协程的用法，需要的朋友们可以学习参考下。
结合HTTPX，用代码讲解对 asyncio 协程的理解和用法
2020-10-30 00:37

young_kp的博客 “HTTPX is a fully featured HTTP client for Python 3, which provides sync and async APIs, and support for both HTTP/1.1 and HTTP/2." 相比 atihttp ，httpx 更得我心，但是具体怎么用，就请移步官网了。 ...
python-httpx的使用
2020-12-17 15:22

玉米丛里吃过亏的博客 HTTPX是Python3的功能齐全的HTTP客户端，它提供同步和异步API，并支持HTTP/1.1和HTTP/2 安装 pip install httpx 创建请求通过httpx库发出一个请求非常简单，如下： import httpx response = httpx.get('...
没有解决我的问题, 去提问

问题事件

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
已结题（查看结题原因） 8月14日
关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
修改了问题 8月13日
关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
创建了问题 8月13日

悬赏问题

¥15 关于c++外部库文件宏的问题，求解
¥15 office打开卡退（新电脑重装office系统后）
¥300 FLUENT 火箭发动机燃烧EDC仿真
¥15 【Hadoop 问题】Hadoop编译所遇问题hadoop-common: make failed with error code 2
¥15 vb6.0+webbrowser无法加载某个网页求解
¥15 RPA财务机器人采购付款流程
¥15 计算机图形多边形及三次样条曲线绘制
¥15 根据protues画的图用keil写程序
¥200 如何使用postGis实现最短领规划？
¥15 pyinstaller打包错误

在使用asyncio异步编程httpx时遇到原因不明的错误导致目标页面代码获取失败

1条回答 默认 最新

问题事件

悬赏问题

1条回答默认最新