我跟着课程做案例,前面代码基本已经检查过,没有问题,程序都可以正常运行,拿到图片url后发送请求下载失败,储存路径和名字都已经检查,能创建文件夹,但无法拿到图片,请求帮助。(已经确认没有cookie,防盗链的反爬机制,图片url可以正常打开)
下面是代码:
源文件:
# -*- coding:utf-8 -*-
import scrapy
from imgsPro.items import ImgsproItem
class ImgSpider(scrapy.Spider):
name = 'img'
#allowed_domains = ['www.xxx.com']
start_urls = ['https://sc.chinaz.com/tupian/']
def parse(self, response):
div_list = response.xpath('//*[@id="container"]/div')
for div in div_list:
# 图片懒加载,动态加载后src,为没有浏览器页面加载时为src2,
#注意:使用伪属性(不一定是src2,也可能是其他)
src2 = 'http:'+div.xpath('./div/a/img/@src2').extract_first()
#print(src2)
item = ImgsproItem()
item['src2'] = src2
yield item
settings:
# Scrapy settings for imgsPro project
#
# For simplicity, this file contains only settings considered important or
# commonly used. You can find more settings consulting the documentation:
#
# https://docs.scrapy.org/en/latest/topics/settings.html
# https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
# https://docs.scrapy.org/en/latest/topics/spider-middleware.html
BOT_NAME = 'imgsPro'
SPIDER_MODULES = ['imgsPro.spiders']
NEWSPIDER_MODULE = 'imgsPro.spiders'
# Crawl responsibly by identifying yourself (and your website) on the user-agent
USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36'
LOG_LEVEL = 'ERROR'
# Obey robots.txt rules
ROBOTSTXT_OBEY = False
# Configure maximum concurrent requests performed by Scrapy (default: 16)
#CONCURRENT_REQUESTS = 32
# Configure a delay for requests for the same website (default: 0)
# See https://docs.scrapy.org/en/latest/topics/settings.html#download-delay
# See also autothrottle settings and docs
#DOWNLOAD_DELAY = 3
# The download delay setting will honor only one of:
#CONCURRENT_REQUESTS_PER_DOMAIN = 16
#CONCURRENT_REQUESTS_PER_IP = 16
# Disable cookies (enabled by default)
#COOKIES_ENABLED = False
# Disable Telnet Console (enabled by default)
#TELNETCONSOLE_ENABLED = False
# Override the default request headers:
#DEFAULT_REQUEST_HEADERS = {
# 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
# 'Accept-Language': 'en',
#}
# Enable or disable spider middlewares
# See https://docs.scrapy.org/en/latest/topics/spider-middleware.html
#SPIDER_MIDDLEWARES = {
# 'imgsPro.middlewares.ImgsproSpiderMiddleware': 543,
#}
# Enable or disable downloader middlewares
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
#DOWNLOADER_MIDDLEWARES = {
# 'imgsPro.middlewares.ImgsproDownloaderMiddleware': 543,
#}
# Enable or disable extensions
# See https://docs.scrapy.org/en/latest/topics/extensions.html
#EXTENSIONS = {
# 'scrapy.extensions.telnet.TelnetConsole': None,
#}
# Configure item pipelines
# See https://docs.scrapy.org/en/latest/topics/item-pipeline.html
ITEM_PIPELINES = {
'imgsPro.pipelines.imgsPilepline': 300,
}
# Enable and configure the AutoThrottle extension (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/autothrottle.html
#AUTOTHROTTLE_ENABLED = True
# The initial download delay
#AUTOTHROTTLE_START_DELAY = 5
# The maximum download delay to be set in case of high latencies
#AUTOTHROTTLE_MAX_DELAY = 60
# The average number of requests Scrapy should be sending in parallel to
# each remote server
#AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
# Enable showing throttling stats for every response received:
#AUTOTHROTTLE_DEBUG = False
# Enable and configure HTTP caching (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
#HTTPCACHE_ENABLED = True
#HTTPCACHE_EXPIRATION_SECS = 0
#HTTPCACHE_DIR = 'httpcache'
#HTTPCACHE_IGNORE_HTTP_CODES = []
#HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'
#指定图片的存储目录(没有会自行创建)
IMAGES_STORE = './imgs'
items:
import scrapy
class ImgsproItem(scrapy.Item):
# define the fields for your item here like:
src2 = scrapy.Field()
# pass
piplines:
import scrapy
class imgsPilepline(ImagesPipeline):
#就是可以根据图片地进行图片数据的请求
def get_media_requests(self, item, info):
print(item['src2'])
#yield scrapy.Request(item['src2']) #不需要callback回调进行数据解析
yield scrapy.Request(url=item['src2'])
#指定图片存储的路径
def file_path(self, request, response=None, info=None, *, item=None):
#在setting设置路径:
#IMAGES_STORE = './imgs'(没有会自行创建)
imgName = 'test.jpg' # request.url.split('/')[-1]
return imgName # 只需要返回图片名称
def item_completed(self, results, item, info):
print(results) #测试
return item #返回给下一个即将执行的管道类(没有可不写)
结果:
(pythonProject) C:\Users\13564\Desktop\pythonProject\imgsPro>scrapy crawl img
http://scpic2.chinaz.net/Files/pic/pic9/202107/bpic23825_s.jpg
http://scpic.chinaz.net/Files/pic/pic9/202107/bpic23823_s.jpg
http://scpic.chinaz.net/Files/pic/pic9/202107/bpic23824_s.jpg
http://scpic3.chinaz.net/Files/pic/pic9/202107/bpic23826_s.jpg
http://scpic3.chinaz.net/Files/pic/pic9/202107/bpic23828_s.jpg
http://scpic3.chinaz.net/Files/pic/pic9/202107/bpic23827_s.jpg
http://scpic3.chinaz.net/Files/pic/pic9/202107/apic34194_s.jpg
http://scpic3.chinaz.net/Files/pic/pic9/202107/apic34190_s.jpg
http://scpic3.chinaz.net/Files/pic/pic9/202107/apic34189_s.jpg
http://scpic.chinaz.net/Files/pic/pic9/202107/apic34191_s.jpg
http://scpic.chinaz.net/Files/pic/pic9/202107/apic34193_s.jpg
http://scpic.chinaz.net/Files/pic/pic9/202107/apic34192_s.jpg
http://scpic3.chinaz.net/Files/pic/pic9/202107/hpic4260_s.jpg
http://scpic3.chinaz.net/Files/pic/pic9/202107/hpic4257_s.jpg
http://scpic3.chinaz.net/Files/pic/pic9/202107/hpic4259_s.jpg
http://scpic3.chinaz.net/Files/pic/pic9/202107/hpic4256_s.jpg
http://scpic1.chinaz.net/Files/pic/pic9/202107/hpic4255_s.jpg
http://scpic1.chinaz.net/Files/pic/pic9/202107/hpic4258_s.jpg
http://scpic1.chinaz.net/Files/pic/pic9/202107/apic34327_s.jpg
http://scpic2.chinaz.net/Files/pic/pic9/202107/apic34251_s.jpg
http://scpic2.chinaz.net/Files/pic/pic9/202107/apic34253_s.jpg
http://scpic2.chinaz.net/Files/pic/pic9/202107/apic34250_s.jpg
http://scpic2.chinaz.net/Files/pic/pic9/202107/apic34249_s.jpg
http://scpic2.chinaz.net/Files/pic/pic9/202107/apic34252_s.jpg
http://scpic2.chinaz.net/Files/pic/pic9/202107/apic34254_s.jpg
http://scpic1.chinaz.net/Files/pic/pic9/202107/bpic23818_s.jpg
http://scpic1.chinaz.net/Files/pic/pic9/202107/bpic23822_s.jpg
http://scpic2.chinaz.net/Files/pic/pic9/202107/bpic23819_s.jpg
http://scpic2.chinaz.net/Files/pic/pic9/202107/bpic23817_s.jpg
http://scpic.chinaz.net/Files/pic/pic9/202107/bpic23821_s.jpg
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
[(False, <twisted.python.failure.Failure scrapy.pipelines.files.FileException: download-error>)]
请帮一下我