用scrapy爬取站长素材无法下载图片

我跟着课程做案例，前面代码基本已经检查过，没有问题，程序都可以正常运行，拿到图片url后发送请求下载失败，储存路径和名字都已经检查，能创建文件夹，但无法拿到图片，请求帮助。（已经确认没有cookie，防盗链的反爬机制，图片url可以正常打开）
下面是代码：
源文件：

# -*- coding:utf-8 -*-
import scrapy
from imgsPro.items import ImgsproItem

class ImgSpider(scrapy.Spider):
    name = 'img'
    #allowed_domains = ['www.xxx.com']
    start_urls = ['https://sc.chinaz.com/tupian/']

    def parse(self, response):
        div_list = response.xpath('//*[@id="container"]/div')
        for div in div_list:
            # 图片懒加载，动态加载后src，为没有浏览器页面加载时为src2，
            #注意：使用伪属性（不一定是src2，也可能是其他）
            src2 = 'http:'+div.xpath('./div/a/img/@src2').extract_first()
            #print(src2)
            
            item = ImgsproItem()
            item['src2'] = src2

            yield item

settings：

# Scrapy settings for imgsPro project
#
# For simplicity, this file contains only settings considered important or
# commonly used. You can find more settings consulting the documentation:
#
#     https://docs.scrapy.org/en/latest/topics/settings.html
#     https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
#     https://docs.scrapy.org/en/latest/topics/spider-middleware.html

BOT_NAME = 'imgsPro'

SPIDER_MODULES = ['imgsPro.spiders']
NEWSPIDER_MODULE = 'imgsPro.spiders'


# Crawl responsibly by identifying yourself (and your website) on the user-agent
USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36'
LOG_LEVEL = 'ERROR'
# Obey robots.txt rules
ROBOTSTXT_OBEY = False

# Configure maximum concurrent requests performed by Scrapy (default: 16)
#CONCURRENT_REQUESTS = 32

# Configure a delay for requests for the same website (default: 0)
# See https://docs.scrapy.org/en/latest/topics/settings.html#download-delay
# See also autothrottle settings and docs
#DOWNLOAD_DELAY = 3
# The download delay setting will honor only one of:
#CONCURRENT_REQUESTS_PER_DOMAIN = 16
#CONCURRENT_REQUESTS_PER_IP = 16

# Disable cookies (enabled by default)
#COOKIES_ENABLED = False

# Disable Telnet Console (enabled by default)
#TELNETCONSOLE_ENABLED = False

# Override the default request headers:
#DEFAULT_REQUEST_HEADERS = {
#   'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
#   'Accept-Language': 'en',
#}

# Enable or disable spider middlewares
# See https://docs.scrapy.org/en/latest/topics/spider-middleware.html
#SPIDER_MIDDLEWARES = {
#    'imgsPro.middlewares.ImgsproSpiderMiddleware': 543,
#}

# Enable or disable downloader middlewares
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
#DOWNLOADER_MIDDLEWARES = {
#    'imgsPro.middlewares.ImgsproDownloaderMiddleware': 543,
#}

# Enable or disable extensions
# See https://docs.scrapy.org/en/latest/topics/extensions.html
#EXTENSIONS = {
#    'scrapy.extensions.telnet.TelnetConsole': None,
#}

# Configure item pipelines
# See https://docs.scrapy.org/en/latest/topics/item-pipeline.html
ITEM_PIPELINES = {
    'imgsPro.pipelines.imgsPilepline': 300,
}

# Enable and configure the AutoThrottle extension (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/autothrottle.html
#AUTOTHROTTLE_ENABLED = True
# The initial download delay
#AUTOTHROTTLE_START_DELAY = 5
# The maximum download delay to be set in case of high latencies
#AUTOTHROTTLE_MAX_DELAY = 60
# The average number of requests Scrapy should be sending in parallel to
# each remote server
#AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
# Enable showing throttling stats for every response received:
#AUTOTHROTTLE_DEBUG = False

# Enable and configure HTTP caching (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
#HTTPCACHE_ENABLED = True
#HTTPCACHE_EXPIRATION_SECS = 0
#HTTPCACHE_DIR = 'httpcache'
#HTTPCACHE_IGNORE_HTTP_CODES = []
#HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'

#指定图片的存储目录（没有会自行创建）
IMAGES_STORE = './imgs'

items：

import scrapy


class ImgsproItem(scrapy.Item):
    # define the fields for your item here like:
    src2 = scrapy.Field()
    # pass

piplines：

import scrapy
class imgsPilepline(ImagesPipeline):

    #就是可以根据图片地进行图片数据的请求
    def get_media_requests(self, item, info):
        print(item['src2'])
        #yield scrapy.Request(item['src2']) #不需要callback回调进行数据解析
        yield scrapy.Request(url=item['src2'])

    #指定图片存储的路径
    def file_path(self, request, response=None, info=None, *, item=None):
        #在setting设置路径：
            #IMAGES_STORE = './imgs'（没有会自行创建）

        imgName = 'test.jpg'  # request.url.split('/')[-1]
        return imgName  # 只需要返回图片名称

    def item_completed(self, results, item, info):
        print(results) #测试
        return item #返回给下一个即将执行的管道类（没有可不写）

结果：

(pythonProject) C:\Users\13564\Desktop\pythonProject\imgsPro>scrapy crawl img
http://scpic2.chinaz.net/Files/pic/pic9/202107/bpic23825_s.jpg
http://scpic.chinaz.net/Files/pic/pic9/202107/bpic23823_s.jpg
http://scpic.chinaz.net/Files/pic/pic9/202107/bpic23824_s.jpg
http://scpic3.chinaz.net/Files/pic/pic9/202107/bpic23826_s.jpg
http://scpic3.chinaz.net/Files/pic/pic9/202107/bpic23828_s.jpg
http://scpic3.chinaz.net/Files/pic/pic9/202107/bpic23827_s.jpg
http://scpic3.chinaz.net/Files/pic/pic9/202107/apic34194_s.jpg
http://scpic3.chinaz.net/Files/pic/pic9/202107/apic34190_s.jpg
http://scpic3.chinaz.net/Files/pic/pic9/202107/apic34189_s.jpg
http://scpic.chinaz.net/Files/pic/pic9/202107/apic34191_s.jpg
http://scpic.chinaz.net/Files/pic/pic9/202107/apic34193_s.jpg
http://scpic.chinaz.net/Files/pic/pic9/202107/apic34192_s.jpg
http://scpic3.chinaz.net/Files/pic/pic9/202107/hpic4260_s.jpg
http://scpic3.chinaz.net/Files/pic/pic9/202107/hpic4257_s.jpg
http://scpic3.chinaz.net/Files/pic/pic9/202107/hpic4259_s.jpg
http://scpic3.chinaz.net/Files/pic/pic9/202107/hpic4256_s.jpg
http://scpic1.chinaz.net/Files/pic/pic9/202107/hpic4255_s.jpg
http://scpic1.chinaz.net/Files/pic/pic9/202107/hpic4258_s.jpg
http://scpic1.chinaz.net/Files/pic/pic9/202107/apic34327_s.jpg
http://scpic2.chinaz.net/Files/pic/pic9/202107/apic34251_s.jpg
http://scpic2.chinaz.net/Files/pic/pic9/202107/apic34253_s.jpg
http://scpic2.chinaz.net/Files/pic/pic9/202107/apic34250_s.jpg
http://scpic2.chinaz.net/Files/pic/pic9/202107/apic34249_s.jpg
http://scpic2.chinaz.net/Files/pic/pic9/202107/apic34252_s.jpg
http://scpic2.chinaz.net/Files/pic/pic9/202107/apic34254_s.jpg
http://scpic1.chinaz.net/Files/pic/pic9/202107/bpic23818_s.jpg
http://scpic1.chinaz.net/Files/pic/pic9/202107/bpic23822_s.jpg
http://scpic2.chinaz.net/Files/pic/pic9/202107/bpic23819_s.jpg
http://scpic2.chinaz.net/Files/pic/pic9/202107/bpic23817_s.jpg
http://scpic.chinaz.net/Files/pic/pic9/202107/bpic23821_s.jpg
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]

请帮一下我

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
m0_58990004 2021-08-03 22:08
关注
找到原因了，是要在setting中加上MEDIA_ALLOW_REDIRECTS = True，貌似是中间件的内容，我还没学到，所以不清楚什么意思，有大佬可以解释一下吗
看所有日志后会发现其实有地方报错了，直接复制到百度是告诉我加上上述语句就可以了。但如果setting中有LOG_LEVEL = 'ERROR'是不会报错的。

本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决 2

无用 2
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

报告相同问题？

关注问题

用scrapy爬取站长素材无法下载图片 python 爬虫
2021-08-03 18:51

回答 2 已采纳找到原因了，是要在setting中加上MEDIA_ALLOW_REDIRECTS = True，貌似是中间件的内容，我还没学到，所以不清楚什么意思，有大佬可以解释一下吗看所有日志后会发现其实有地方报错
scrapy爬取图片，爬取不到 python 有问必答
2021-05-23 20:32

回答 2 已采纳你已经爬到图片连接了，这个看到的管道文件的代码怎样写，要对图片链接发送请求访问，然后保存才行
如何利用scrapy爬取带标签的网页内容并保存到自己的服务器上？ mysql python sql
2018-02-09 09:34

回答 3 已采纳 1. 把整个爬取到的网页内容直接存储到数据库肯定是可以的，你之所以没有成功，应该是因为你的数据库中的相应字段错了，整个网页内容都比较长，一般都是要用text字段，甚至是LongText)（最大长度42
scrapy图片爬取（爬取站长素材中的高清图片）
2021-09-15 12:22

kangaroo萧筱的博客 -需求:爬取站长素材中的高清图片 -使用流程: -数据解析(图片的地址) -将存储图片地址的item提交到制定的管道类 -在管道文件中自定制个基于ImagesPipeLine的一个管道类 - get_media_request - file_path - ...
scrapy 爬取图片时图片的url总是显示None python 有问必答
2021-12-04 01:41

回答 1 已采纳 img_url = div.xpath('./div/a/img/src').extract_first() src前面少了 @ 改成 img_url = div.xpath('./
scrapy 爬取图片报错 error processing python
2021-12-20 01:08

回答 1 已采纳 http: 去那里啦?
scrapy爬取百度图片时Forbid spider access python 有问必答
2021-06-13 23:12

回答 2 已采纳这个是百度反爬虫导致的，
scrapy爬取站长素材
2021-05-23 08:47

itLaity的博客 scrapy爬取站长素材： 1、创建项目scrapy startproject 爬虫项目名字 2、创建虫子scrapy genspider 虫名字 3、setting里面加UA伪装 4、加LOG_LEVEL级别、ROBOTSTXT_OBEY = False 5、虫名字里面爬取网站和解析...
scrapy 爬取商品视频url,详情失败 python
2022-02-17 10:28

回答 3 已采纳 js加载的，先打印看下获取的整个网页。我试了下，在下图这个位置可以找到视频链接，可以用正则或者别的方法取出来
在以瀑布流方式翻页的网站,使用scrapy网络爬虫,但是只爬取了第一页数据,没有爬取第二页. python 爬虫
2021-09-05 19:18

回答 2 已采纳那叫ajax，
scrapy爬取知乎首页乱码
2017-12-01 03:21

回答 2 已采纳 ```python HEADERS = { 'Host': 'www.zhihu.com', 'Accept': 'text/html,application/xhtml+xml
python爬虫之scrapy图片数据爬取，以站长素材为例
2024-07-19 14:21

python爬虫之scrapy图片数据爬取，以站长素材为例
scrapy 爬虫大量链接返回None不知道为啥 python
2020-05-29 14:50

回答 2 已采纳 200说明成功了，返回None是因为你返回值本来就设置成None，或者没设置返回值导致python默认返回None
使用scrapy爬取图片
2018-08-11 11:51

江玉郎的博客这里我们以美食杰为例，爬取它的图片，作为演示，这里只爬取一页。美食杰网址 1 首先我们在命令行进入到我们要创建的目录，输入 scrapy startproject meishi, 接着根据提示cd meishi，再cd meishi, , 下来写 ...
scrapy爬取图片并自定义图片名字
2018-12-28 22:25

liudahai777的博客　Scrapy使用ImagesPipeline类中函数get_media_requests下载到图片后，默认的图片命名为图片下载链接的哈希值，例如：它的下载链接是，哈希值为7710759a8e3444c8d28ba81a4421ed,那么最终的图片下载到指定路径后名称...
没有解决我的问题, 去提问

问题事件

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
系统已结题 8月11日
关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
已采纳回答 8月3日
关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
创建了问题 8月3日

悬赏问题

¥15 在若依框架下实现人脸识别
¥15 网络科学导论，网络控制
¥100 安卓tv程序连接SQLSERVER2008问题
¥15 利用Sentinel-2和Landsat8做一个水库的长时序NDVI的对比，为什么Snetinel-2计算的结果最小值特别小，而Lansat8就很平均
¥15 metadata提取的PDF元数据，如何转换为一个Excel
¥15 关于arduino编程toCharArray()函数的使用
¥100 vc++混合CEF采用CLR方式编译报错
¥15 coze 的插件输入飞书多维表格 app_token 后一直显示错误，如何解决？
¥15 vite+vue3+plyr播放本地public文件夹下视频无法加载
¥15 c#逐行读取txt文本，但是每一行里面数据之间空格数量不同

用scrapy爬取站长素材无法下载图片

2条回答 默认 最新

问题事件

悬赏问题

2条回答默认最新