Scrapy CrawlSpider OpenSSL .SSL.Error: [('SSL routines', '', 'unexpected eof while reading')]>]

问题遇到的现象和发生背景

当我使用scrapy CrawlSpider模板爬取网页时，报了如下错误。

2022-05-22 18:12:45 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://xxx/xxx/xxxx> (failed 1 times): [<twisted.python.failure.Failure OpenSSL.S
SL.Error: [('SSL routines', '', 'unexpected eof while reading')]>]

问题相关代码，请勿粘贴截图

我的spider代码

class ReadSpider(CrawlSpider):
    name = 'read'
    allowed_domains = ['xxxxx']
    start_urls = ['xxxxx']

    rules = (
        Rule(LinkExtractor(allow=r'/book/[\d]+_[\d]+.html'), callback='parse_item', follow=True),
        Rule(LinkExtractor(restrict_xpaths='//div[@id="tab1"]/div[@class="class-nav"]/a'), callback='parse_item',
             follow=True),
    )
    # follow=True根据执行完解析类后的response继续提取

    def parse_item(self, response):
        # item['domain_id'] = response.xpath('//input[@id="sid"]/@value').get()
        # item['name'] = response.xpath('//div[@id="name"]').get()
        # item['description'] = response.xpath('//div[@id="description"]').get()
        books = response.xpath('//div[@class="bookslist"]/ul/li/div')
        for book in books:
            item = DushuItem()
            name = book.xpath('./h3/a/text()').get()
            p = book.xpath('./p')
            author = p[0].xpath('./text()').get()
            info = p[1].xpath('./text()').get()
            imgUrl = book.xpath('./div/a/img/@src').get()  
            infoUrl = 'htttp://xxxxxx.com' + book.xpath('./h3/a/@href').get() 
            item['name'] = '《' + name + '》'
            item['author'] = author
            item['imgUrl'] = imgUrl
            item['info'] = info
            # return scrapy.Request(url=infoUrl, callback=self.parseDetail, meta={'item': item})
            # yield item
            print('------------',infoUrl[0:len(infoUrl)-1])
            # print(infoUrl[1:])
            yield scrapy.Request(url=infoUrl[0:len(infoUrl)-1], callback=self.parseDetail, meta={'item': item})

    def parseDetail(self, response):
        item = response.meta['item']
        infoS = response.xpath('//div[@class="text txtsummary"]/text()').get()
        # info = re.findall(r'\u3000\u3000(.*)', infoS)
        # info = re.sub(r'"', r'\"', infoS)
        # if info:
        #     item['info'] = info[0]
        # else:
        #     item['info'] = infoS
        item['info'] = infoS
        yield item

我的解答思路和尝试过的方法

把yield scrapy.Request(url=infoUrl[0:len(infoUrl)-1], callback=self.parseDetail, meta={'item': item})换成yield item可以正常爬取，这是为什么呢？

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除
收藏举报

报告相同问题？

关注问题

SCRAPY运行报错， [scrapy.core.engine] INFO: Spider closed (finished)！ python
2021-07-26 15:56

回答 2 已采纳这个就是正常爬完了的日志信息吧，没啥问题啊
requests.exceptions.ConnectionError: ('Connection aborted.', OSError(0, 'Error')) python
2020-08-04 16:39

回答 3 已采纳可能的原因：网络不好，超时断开，或者对方服务器有限制爬虫，你慢一点爬，过一会重试，换一个ip
爬虫scrapy框架爬不出来，但是request可以出来 http python 爬虫
2022-05-06 00:26

回答 2 已采纳你应该继承 scrapy.SpiderCrawlSpider 不要自定义 parse 函数。
＜twisted.python.failure.Failure OpenSSL.SSL.Error: [(‘SSL routines‘, ‘‘, ‘unexpected eof while readi
2022-08-17 17:49

安格会魔法的博客 scrapy请求中SSL报错
scrapy爬虫出现 DEBUG: Crawled (404) python
2019-04-17 16:25

回答 1 已采纳如果楼主是用scrapy框架爬的话，可以在settings.py加上User-Agent信息，这样应该就可以了
python运行scrapy框架出现报错 NameError: name 'imp' is not defined python
2022-04-28 23:20

回答 7 已采纳如果你不记得改了什么的话，重装吧。毕竟你改了啥，怎么改回去就只有神才知道了。环境里面的.py文件改了的话基本没什么方法，除了重装。按报错来看，playwright, pyee,twisted,win3
Python中scrapy.FormRequest老是返回400错误响应 python
2022-09-17 21:20

回答 2 已采纳你可以参考下这篇文章：scrapy框架中的Request()、FormRequest()、FormRequest.from_response()的小结
scrapy twisted.python.failure.Failure OpenSSL.SSL.Error
2022-06-07 12:22

天使彦的博客 twisted.python.failure.Failure OpenSSL.SSL.Error
在 python scrapy爬虫框架：response.xpath（）的返回值是[ ],这个怎么解决？ python
2020-07-03 11:16

回答 4 已采纳考虑网页的内容使用了ajax，使用右键-》查看网页源代码，看是否仍然能获得指定的内容
python抓取405错误 python 有问必答爬虫
2022-01-07 16:37

回答 2 已采纳建议使用requests,添加参数headers,cookies，params,这样试一下。
利用Scrapy框架爬虫时出现报错ModuleNotFoundError: No module named 'scrapytest.NewsItems'？ python
2019-11-15 23:52

回答 2 已采纳 import scrapy #引入容器 from scrapytest.NewsItems import NewsItem 改为 from scrapytest.items import Ne
scrapy OpenSSL.SSL.Error
2022-08-19 13:51

杉杉锅锅的博客 scrapy OpenSSL.SSL.Error
关于#python#的问题：第一次run程序可以爬去信息，但是启动第二次run程序就会出现这个问题 python
2022-03-11 17:47

回答 1 已采纳第一次已经关闭的连接，不能再次关闭
scrapy出现SSL问题如何解决？ <twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', '', 'unsafe...
2022-09-29 14:20

始識的博客 twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', '', 'unsafe legacy renegotiation disabled')]> 自从升级了python3.10 有些维护的网站就爬取不了了报错如上经分析可知问题有2 1. 没有...
＜twisted.python.failure.Failure OpenSSL.SSL.Error: [(‘SSL routines‘, ‘‘, ‘wrong signature type‘)]
2022-08-30 13:23

white.tie的博客 [< twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', '', 'wrong signature type')]>]解决处理
没有解决我的问题, 去提问

问题事件

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
系统已结题 5月30日
关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
修改了问题 5月22日
关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
创建了问题 5月22日

悬赏问题

¥15 Vue3 大型图片数据拖动排序
¥15 划分vlan后不通了
¥15 GDI处理通道视频时总是带有白色锯齿
¥20 用雷电模拟器安装百达屋apk一直闪退
¥15 算能科技20240506咨询（拒绝大模型回答）
¥15 自适应 AR 模型参数估计Matlab程序
¥100 角动量包络面如何用MATLAB绘制
¥15 merge函数占用内存过大
¥15 使用EMD去噪处理RML2016数据集时候的原理
¥15 神经网络预测均方误差很小但是图像上看着差别太大

Scrapy CrawlSpider OpenSSL .SSL.Error: [('SSL routines', '', 'unexpected eof while reading')]>]

问题遇到的现象和发生背景

问题相关代码，请勿粘贴截图

我的解答思路和尝试过的方法

0条回答 默认 最新

问题事件

悬赏问题

0条回答默认最新