wusaicyq 2023-05-30 18:49 采纳率: 18.2%
浏览 10

scrapy框架,爬虫,中间件

##spider代码
import scrapy


class MiddleSpider(scrapy.Spider):
    name = "middle"
    #allowed_domains = ["www.xxx.com"]
    start_urls = ["http://www.baidu.com/s?wd=ip"]

    def parse(self, response):
        page_text=response.text

        with open("ip.html","w",encoding="utf-8") as fp:
            fp.write(page_text)



##scrapy中间件
from scrapy import signals

# useful for handling different item types with a single interface
from itemadapter import is_item, ItemAdapter

import random


class MiddleproDownloaderMiddleware:
    # Not all methods need to be defined. If a method is not defined,
    # scrapy acts as if the downloader middleware does not modify the
    # passed objects.
    user_agent_list=['Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36'
         ]
    PROXY_http=["114.231.42.244","183.236.232.160"]
    PROXY_https=["120.83.49.90:9000","95.189.112.214:35508"]
    #拦截请求
    def process_request(self, request, spider):
        request.headers["User-Agent"]=random.choice(self.user_agent_list)
        #验证代理的操作是否会生效
        request.meta["proxy"]="http://182.139.110.18"
        return None
    #拦截所有响应
    def process_response(self, request, response, spider):
        # Called with the response returned from the downloader.

        # Must either;
        # - return a Response object
        # - return a Request object
        # - or raise IgnoreRequest
        return response
    #拦截异常请求
    def process_exception(self, request, exception, spider):
        if request.url.split(":")[0] == "http":
        #代理
            request.meta["proxy"]="http://"+random.choice(self.PROXY_http)
        else:
            request.meta["proxy"]="https://"+random.choice(self.PROXY_https)

        return request#将修正后的请求对象重新请求发送
出现如下报错
2023-05-30 18:45:14 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://www.baidu.com/s?wd=ip> (failed 3 times): TCP connection timed out: 10060: 由于连接方在一段时间后没有正确答复或连接的主机没有反应,连接尝试失败。
  • 写回答

2条回答 默认 最新

  • 青霄 2023-05-30 18:52
    关注

    测试下你的代理是否可用

    评论

报告相同问题?

问题事件

  • 创建了问题 5月30日

悬赏问题

  • ¥15 35114 SVAC视频验签的问题
  • ¥15 impedancepy
  • ¥15 在虚拟机环境下完成以下,要求截图!
  • ¥15 求往届大挑得奖作品(ppt…)
  • ¥15 如何在vue.config.js中读取到public文件夹下window.APP_CONFIG.API_BASE_URL的值
  • ¥50 浦育平台scratch图形化编程
  • ¥20 求这个的原理图 只要原理图
  • ¥15 vue2项目中,如何配置环境,可以在打完包之后修改请求的服务器地址
  • ¥20 微信的店铺小程序如何修改背景图
  • ¥15 UE5.1局部变量对蓝图不可见