NiffrG 2023-08-13 15:36 采纳率: 0%
浏览 4
已结题

在使用asyncio异步编程httpx时遇到原因不明的错误导致目标页面代码获取失败

在使用asyncio异步编程httpx时遇到原因不明的错误导致目标页面源代码获取失败,细节如下:

问题代码用于下载wallhaven.cc上的图片,图片的编号被正确地储存在同目录下的file.txt中。程序会读取这个文件,并根据文件所存储的图片编号生成该图片所在页面的网址,获取这个网址的源代码并在源代码中找到图片元素,下载并存储图片。

由于图片本身所在的网址不规律,但图片所在页面的网址是规律的,所以只好先找图片所在页面的网址,再从这个网址中找到图片元素的地址。

测试数据如下:
2e31px
Errortest
k9v3om
3k62g3
8x967o
2k9lqy
j813pm
rrjvyq
7prdye
5gr1w5
qzlwk5
其中对于2e31px这一图片编号,正常应该有一个404状态,可能因为某种原因,这张图片在网站上不再可用。
对于Errortest这一被当成图片编号的测试数据,应当会有index out of range 错误,因为使用这个编号生成的网址所对应的页面是网站的错误提示,不存在图片文件。
对于其他编号,正常情况下程序应当可以通过生成的地址访问一个含有图片的界面,并找到、下载这张图片到与代码相同的目录下。
但是对于如下代码,运行时出现不明原因的错误

import os
import random
import httpx
import asyncio
from lxml import html

from asyncio import Semaphore

semaphore = Semaphore(7)

async def record_error(url, error, error_file_path):
    with open(error_file_path, 'a') as error_file:
        error_file.write(f"{url}\n")
        error_file.write(f"{error}\n\n")

async def get_page(url, session):
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:109.0) Gecko/20100101 Firefox/115.0',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
        'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
        'Connection': 'keep-alive'
    }
    try:
        async with await session.get(url, headers=headers, timeout=60) as res:
            print("DBTAG 5")
            # 取消括号,改为 res.text
            return await res.text
    except Exception as e:
        print(f"Error occurred while getting the page: {e}")
        await record_error(url, str(e), error_file_path)
        return None

async def download(url, session, error_file_path):
    try:
        print(f"Downloading image: {url}...")
        print("Setting headers...")
        headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:109.0) Gecko/20100101 Firefox/115.0',
            'Host': 'w.wallhaven.cc',
            'Accept': 'image/avif,image/webp,*/*',
            'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
            'Referer': 'https://wallhaven.cc/'
        }
        print("session getting...")
        res = await session.get(url, headers=headers, timeout=60)
        assert res.status_code == 200
        res.raise_for_status()
        print(f"Succeeded in downloading {url}")
        return res.text
    except Exception as e:
        fail = os.path.basename(url)
        print(f"Download failed: {e}, {fail}")
        await record_error(url, str(e), error_file_path)
        return None

async def Cycle(session, line, error_file_path):
    try:
        url = f'https://wallhaven.cc/w/{line}'
        print("DBTAG 2")
        html_ = await get_page(url, session)
        print("DBTAG 3")
        print(url)
        print(html_)
        image_urls = html.fromstring(html_).xpath('//img/@src')
        image_elem = await download(image_urls[2], session, error_file_path)
        print("DBTAG 4")
        if image_elem:
            filename = os.path.basename(image_urls[2])
            with open(filename, 'wb') as f:
                f.write(image_elem.content)
    except Exception as e:
        print(f"Error occurred: {e}")
        fail = os.path.basename(url)
        print(f"Download failed: {fail}")
        await record_error(url, str(e), error_file_path)

if __name__ == "__main__":
    current_directory = os.getcwd()
    print("当前工作目录:", current_directory)
    file_path = os.path.join(current_directory, 'file.txt')

    with open('file.txt', 'r', buffering=20971520) as file:
        print("reading the file...")
        lines = file.read().splitlines()

    print("Initializing error saver.")
    error_file_path = os.path.join(current_directory, 'Error_Path.txt')
    with open(error_file_path, 'w') as error_file:
        error_file.write("Error Messages:\n")

    async def main():
        async with httpx.AsyncClient() as session:
            print("DBTAG 1")
            tasks = [Cycle(session, line, error_file_path) for line in lines]
            await asyncio.gather(*tasks)
    asyncio.run(main())
    print('Completed.')

运行输出如下:

当前工作目录: I:\...(此处省略)
reading the file...
Initializing error saver.
DBTAG 1
DBTAG 2
DBTAG 2
DBTAG 2
DBTAG 2
DBTAG 2
DBTAG 2
DBTAG 2
DBTAG 2
DBTAG 2
DBTAG 2
DBTAG 2
Error occurred while getting the page: __aexit__
DBTAG 3
https://wallhaven.cc/w/5gr1w5
None
Error occurred: expected string or bytes-like object
Download failed: 5gr1w5
Error occurred while getting the page: __aexit__
DBTAG 3
https://wallhaven.cc/w/rrjvyq
None
Error occurred: expected string or bytes-like object
Download failed: rrjvyq
Error occurred while getting the page: __aexit__
DBTAG 3
https://wallhaven.cc/w/Errortest
None
Error occurred: expected string or bytes-like object
Download failed: Errortest
Error occurred while getting the page: __aexit__
DBTAG 3
https://wallhaven.cc/w/qzlwk5
None
Error occurred: expected string or bytes-like object
Download failed: qzlwk5
Error occurred while getting the page: __aexit__
DBTAG 3
https://wallhaven.cc/w/2e31px
None
Error occurred: expected string or bytes-like object
Download failed: 2e31px
Error occurred while getting the page: __aexit__
DBTAG 3
https://wallhaven.cc/w/j813pm
None
Error occurred: expected string or bytes-like object
Download failed: j813pm
Error occurred while getting the page: __aexit__
DBTAG 3
https://wallhaven.cc/w/k9v3om
None
Error occurred: expected string or bytes-like object
Download failed: k9v3om
Error occurred while getting the page: __aexit__
DBTAG 3
https://wallhaven.cc/w/3k62g3
None
Error occurred: expected string or bytes-like object
Download failed: 3k62g3
Error occurred while getting the page: __aexit__
DBTAG 3
https://wallhaven.cc/w/8x967o
None
Error occurred: expected string or bytes-like object
Download failed: 8x967o
Error occurred while getting the page: __aexit__
DBTAG 3
https://wallhaven.cc/w/7prdye
None
Error occurred: expected string or bytes-like object
Download failed: 7prdye
Error occurred while getting the page: __aexit__
DBTAG 3
https://wallhaven.cc/w/2k9lqy
None
Error occurred: expected string or bytes-like object
Download failed: 2k9lqy
Completed.

请问是何种原因导致了错误,如何解决?
(本人初中文凭,属于初学者,烦请答主讲得通俗些)

  • 写回答

1条回答 默认 最新

  • CSDN-Ada助手 CSDN-AI 官方账号 2023-08-13 19:02
    关注

    【相关推荐】




    如果你已经解决了该问题, 非常希望你能够分享一下解决方案, 写成博客, 将相关链接放在评论区, 以帮助更多的人 ^-^
    评论

报告相同问题?

问题事件

  • 已结题 (查看结题原因) 8月14日
  • 修改了问题 8月13日
  • 创建了问题 8月13日

悬赏问题

  • ¥15 关于c++外部库文件宏的问题,求解
  • ¥15 office打开卡退(新电脑重装office系统后)
  • ¥300 FLUENT 火箭发动机燃烧EDC仿真
  • ¥15 【Hadoop 问题】Hadoop编译所遇问题hadoop-common: make failed with error code 2
  • ¥15 vb6.0+webbrowser无法加载某个网页求解
  • ¥15 RPA财务机器人采购付款流程
  • ¥15 计算机图形多边形及三次样条曲线绘制
  • ¥15 根据protues画的图用keil写程序
  • ¥200 如何使用postGis实现最短领规划?
  • ¥15 pyinstaller打包错误