爬虫异步操作异常处理try判断出错

用爬虫爬取ip代理网站的ip时，用try判断ip是否超时，结果在session.get()前面加了一个await 好像并没有进行判断 timeout设置为3s 好像直接跳转到了except中去所有的ip全部被打印成不可用但是我自己测试的时候发现是能用的而且代码执行的很快压根没有检测是否超时

代码如下

import asyncio
import json
import requests
from bs4 import BeautifulSoup
import aiohttp
import aiofiles
import asyncore
import json
from lxml import etree
import random

async def get_ip(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as f:
            a = await f.text()
            bsl = BeautifulSoup(a,'html.parser')
            bss = bsl.find('table',width="100%").select('tr')[1:]
            for list in bss:
                ip = list.select('tr td')[0].text
                port = list.select('tr td')[1].text
                proxies={
                    f'https':f'https://{ip}:{port}'
                }
                asyncio.gather(verify(proxies))


async def verify(proxies):
    async with aiohttp.ClientSession() as session:
        try:
            f = session.get('https://www.baidu.com',proxies=random.choice(proxies),async_timeout = 3)
            print('可用代理:{}'.format(proxies))
            await write_json(proxies)
        except:
            print('不可用的:{}'.format(proxies))



async def write_json(proxies):
    async with aiofiles.open('ip处理池.json','a') as f:
        await json.dump(proxies,f)


async def rea_json():
    async with aiofiles.open('ip处理池.json','r')as f:
        for i in f.readlines():
            content = json.loads(i.strip())
            print(content)


async def main():
    tasks = []
    for i in range(100):
        url = f'http://www.66ip.cn/{i}.html'
        tasks.append(asyncio.create_task(get_ip(url)))
    await asyncio.wait(tasks)



if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

～白+黑 Python领域新星创作者 2022-04-04 21:05

关注

        f = session.get('https://www.baidu.com',proxies=random.choice(proxies),async_timeout = 3) random.choice会直接出错


async def get_ip(url):

    async with aiohttp.ClientSession() as session:

        async with session.get(url) as f:

            a = await f.text()

            bsl = BeautifulSoup(a,'html.parser')

            bss = bsl.find('table',width="100%").select('tr')[1:]

            for list in bss:

                ip = list.select('tr td')[0].text

                port = list.select('tr td')[1].text

                proxies={

                    f'https':f'https://{ip}:{port}'

                }

                asyncio.gather(verify(proxies))#这个地方没必要并发了，每次就一个函数实例，直接await verify(proxies)就行了吧，

async def verify(proxies):

    async with aiohttp.ClientSession() as session:

        try:
            #random.choice(proxies)，每次字典里只有一个数据没必要再随机取出，这个位置会出错,而且字典类型是不被random。choice支持的
            f = session.get('https://www.baidu.com',proxies=random.choice(proxies),async_timeout = 3)

            print('可用代理:{}'.format(proxies))

            await write_json(proxies)

        except:建议打印明确错类型来调试，except exception as e

            print('不可用的:{}'.format(proxies))

编辑记录

报告相同问题？

关注问题

python爬虫，账号反爬怎么处理 python 爬虫
2021-09-08 19:39

回答 3 已采纳目前来看，账号反爬没有什么太好的应对措施。一旦你的账号确定被反爬了，就只能更换账号了，或者和网站客服沟通。对于账号反爬网站，一般来说，就是ip代理池和账号随机混用，还需要保证ip的质量，地域差异不要太
python 爬虫 selenium 驱动安装出错 python selenium 爬虫
2022-02-11 20:26

回答 4 已采纳如果楼上的不行再试试我的 if __name__ == '__main__': # 输入 Driver 的绝对路径 driver_path = 'C:\edgedriver_win6
python爬虫,爬取的数据异常，如何解决？ python 爬虫
2023-02-22 21:55

回答 3 已采纳这个问题在于html_data()函数中的return语句放在了for循环内部，这导致函数只能返回第一个电影的信息。把return语句缩进移动到for循环之外，这样才能返回整个页面的电影信息。另外，s
python爬虫常见异常共1页.pdf.zip
2022-10-28 17:01

Python爬虫在开发过程中会遇到各种异常，这些异常通常是由于网络问题、编码错误、解析错误、权限限制等原因造成的。理解并处理这些异常是编写稳定、高效爬虫的关键。以下是一些Python爬虫中常见的异常及应对策略： ...
python爬虫问题求解 python 爬虫
2022-04-29 11:12

回答 1 已采纳我给你改了一下，你对比看看吧： from bs4 import BeautifulSoup import pandas as pd import requests def crawer_travel
python爬虫selenium基础问题，异常报错 python selenium 爬虫
2021-08-04 10:07

回答 1 已采纳错误提示告诉你，你获取的内容的编码问题，你的程序是按GBK的编码方式取的内容，换种编码。
python爬虫数据显示问题 python 爬虫
2022-07-20 16:54

回答 3 已采纳首先，你这里写错了divs = query(".cm-content-box").items()
python try catch exit_python爬虫常见异常及处理方法
2021-01-13 12:50

我有改名卡的博客在编写python爬虫时经常会遇到异常中断的情况，导致爬虫意外终止，一个理想的爬虫应该能够在遇到这些异常时继续运行。下面就谈谈这几种常见异常及其处理方法：异常1：requests.exceptions.ProxyError对于这个错误，...
Python网络爬虫中json解析失败 json python 有问必答爬虫
2022-02-26 20:51

回答 2 已采纳这个接口返回的是jsonp数据，不是json，要获取text替换掉回调函数名称和前后的括号后才是json数据
Python爬虫代码运行出错 python
2022-07-01 13:07

回答 2 已采纳 http错误没有正确处理
python爬虫html获取不全 html python 爬虫
2022-06-24 19:43

回答 1 已采纳其实有的，但是这个网站应该是为了懒加载把url用base64密了一下，然后再动态加载，其实我下面发的这个就是url 是base64后的url 解码后就是https://s1.aigei.com/
对PYTHON三方异步爬虫库ahttp的探讨
2020-12-22 05:55

近期在学习异步爬虫，在论坛发现这个帖子： https://blog.csdn.net/getcomputerstyle/article/details/103014896 看了之后发现很适合新手使用，于是按照尝试，发现ahttp库近期没有更新，有一些问题存在（也有可能是...
python爬虫selenium点击按钮 python selenium 爬虫
2022-10-21 12:35

回答 2 已采纳可以看下xpath语法，还有个插件（xPath Finder）在firefox浏览器里可以一键定位到元素并复制xpath路径，如果插件给出的xpath路径定位不到，可以尝试自己写相对路径
Python多线程、异步＋多进程爬虫实现代码
2020-09-21 18:11

### Python多线程、异步+多进程爬虫实现代码详解 #### 一、概述在互联网信息爆炸的时代，网络爬虫技术变得越来越重要。它能够帮助我们从大量的网页数据中提取有价值的信息。本文将详细介绍如何利用Python实现一个...
Python基于协程的异步爬虫.zip
2024-05-18 21:49

在Python编程领域，异步爬虫是用于高效抓取网页数据的重要技术，特别是在处理大量数据和高并发场景下。Python的异步特性主要体现在其标准库中的`asyncio`模块，以及第三方库如`aiohttp`。本教程将深入探讨如何基于...
没有解决我的问题, 去提问

问题事件

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
创建了问题 4月4日

悬赏问题

¥15 圆孔衍射光强随孔径变化
¥15 MacBook pro m3max上用vscode运行c语言没有反应
¥15 ESP-PROG配置错误，ALL ONES
¥15 结构功能耦合指标计算
¥50 AI大模型精调（百度千帆、飞浆）
¥15 非科班怎么跑代码？如何导数据和调参
¥15 福州市的全人群死因监测点死亡原因报表
¥15 Altair EDEM中生成一个颗粒，并且各个方向没有初始速度
¥15 系统2008r2 装机配置推荐一下
¥15 悬赏Python-playwright部署在centos7上

爬虫异步操作异常处理try判断出错

1条回答 默认 最新

问题事件

悬赏问题

1条回答默认最新