怎么把图片保存到文件夹里(语言-python)

目标网站: http://www.bbsnet.com/doutu
需求:
1、用scrapy框架把" 斗图"专题的所有表情包下载到文件夹里面;
db.py

import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule


class DbSpider(CrawlSpider):
    name = 'db'
    allowed_domains = ['www.bbsnet.com']
    start_urls = ['http://www.bbsnet.com/doutu']
    print(start_urls)
    rules = (
        Rule(LinkExtractor(allow=(r'http://www.bbsnet.com/\w+.html')), callback='parse_item', follow=True),
    )

    def parse_item(self, response):
        item = {}
        print(response)
        title = response.xpath('//*[@id="content"]/div[1]/h1').get()
        # print(title)
        url = response.xpath('//*[@id="post_content"]/p[1]/img/@src').get()
        print(url)
        # item['domain_id'] = response.xpath('//input[@id="sid"]/@value').get()
        # item['name'] = response.xpath('//div[@id="name"]').get()
        # item['description'] = response.xpath('//div[@id="description"]').get()
        return item

items.py

# Define here the models for your scraped items
#
# See documentation in:
# https://docs.scrapy.org/en/latest/topics/items.html

import scrapy


class BiaoqingbaoItem(scrapy.Item):
    # define the fields for your item here like:
    url = scrapy.Field()
    title = scrapy.Field()
    pass

settings.py

# Scrapy settings for baioqing project
#
# For simplicity, this file contains only settings considered important or
# commonly used. You can find more settings consulting the documentation:
#
#     https://docs.scrapy.org/en/latest/topics/settings.html
#     https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
#     https://docs.scrapy.org/en/latest/topics/spider-middleware.html

BOT_NAME = 'baioqing'

SPIDER_MODULES = ['baioqing.spiders']
NEWSPIDER_MODULE = 'baioqing.spiders'

LOG_LEVEL = 'WARNING'
# Crawl responsibly by identifying yourself (and your website) on the user-agent
#USER_AGENT = 'baioqing (+http://www.yourdomain.com)'

# Obey robots.txt rules
ROBOTSTXT_OBEY = False

# Configure maximum concurrent requests performed by Scrapy (default: 16)
#CONCURRENT_REQUESTS = 32

# Configure a delay for requests for the same website (default: 0)
# See https://docs.scrapy.org/en/latest/topics/settings.html#download-delay
# See also autothrottle settings and docs
#DOWNLOAD_DELAY = 3
# The download delay setting will honor only one of:
#CONCURRENT_REQUESTS_PER_DOMAIN = 16
#CONCURRENT_REQUESTS_PER_IP = 16

# Disable cookies (enabled by default)
#COOKIES_ENABLED = False

# Disable Telnet Console (enabled by default)
#TELNETCONSOLE_ENABLED = False

# Override the default request headers:
DEFAULT_REQUEST_HEADERS = {
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Language': 'en',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36'
}

# Enable or disable spider middlewares
# See https://docs.scrapy.org/en/latest/topics/spider-middleware.html
#SPIDER_MIDDLEWARES = {
#    'baioqing.middlewares.BaioqingSpiderMiddleware': 543,
#}

# Enable or disable downloader middlewares
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
#DOWNLOADER_MIDDLEWARES = {
#    'baioqing.middlewares.BaioqingDownloaderMiddleware': 543,
#}

# Enable or disable extensions
# See https://docs.scrapy.org/en/latest/topics/extensions.html
#EXTENSIONS = {
#    'scrapy.extensions.telnet.TelnetConsole': None,
#}

# Configure item pipelines
# See https://docs.scrapy.org/en/latest/topics/item-pipeline.html
IMAGES_STORE = "E:\images"

ITEM_PIPELINES = {
   'baioqing.pipelines.BaioqingPipeline': 300,
}

# Enable and configure the AutoThrottle extension (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/autothrottle.html
#AUTOTHROTTLE_ENABLED = True
# The initial download delay
#AUTOTHROTTLE_START_DELAY = 5
# The maximum download delay to be set in case of high latencies
#AUTOTHROTTLE_MAX_DELAY = 60
# The average number of requests Scrapy should be sending in parallel to
# each remote server
#AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
# Enable showing throttling stats for every response received:
#AUTOTHROTTLE_DEBUG = False

# Enable and configure HTTP caching (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
#HTTPCACHE_ENABLED = True
#HTTPCACHE_EXPIRATION_SECS = 0
#HTTPCACHE_DIR = 'httpcache'
#HTTPCACHE_IGNORE_HTTP_CODES = []
#HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'

pipelines.py

# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html
 
 
# useful for handling different item types with a single interface
from itemadapter import ItemAdapter
 
 
class BaioqingPipeline:
    def process_item(self, item, spider):
        return item

middlewares.py

# Define here the models for your spider middleware
#
# See documentation in:
# https://docs.scrapy.org/en/latest/topics/spider-middleware.html
 
from scrapy import signals
 
# useful for handling different item types with a single interface
from itemadapter import is_item, ItemAdapter
 
 
class MinSpiderMiddleware:
    # Not all methods need to be defined. If a method is not defined,
    # scrapy acts as if the spider middleware does not modify the
    # passed objects.
 
    @classmethod
    def from_crawler(cls, crawler):
        # This method is used by Scrapy to create your spiders.
        s = cls()
        crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
        return s
 
    def process_spider_input(self, response, spider):
        # Called for each response that goes through the spider
        # middleware and into the spider.
 
        # Should return None or raise an exception.
        return None
 
    def process_spider_output(self, response, result, spider):
        # Called with the results returned from the Spider, after
        # it has processed the response.
 
        # Must return an iterable of Request, or item objects.
        for i in result:
            yield i
 
    def process_spider_exception(self, response, exception, spider):
        # Called when a spider or process_spider_input() method
        # (from other spider middleware) raises an exception.
 
        # Should return either None or an iterable of Request or item objects.
        pass
 
    def process_start_requests(self, start_requests, spider):
        # Called with the start requests of the spider, and works
        # similarly to the process_spider_output() method, except
        # that it doesn’t have a response associated.
 
        # Must return only requests (not items).
        for r in start_requests:
            yield r
 
    def spider_opened(self, spider):
        spider.logger.info('Spider opened: %s' % spider.name)
 
 
class MinDownloaderMiddleware:
    # Not all methods need to be defined. If a method is not defined,
    # scrapy acts as if the downloader middleware does not modify the
    # passed objects.
 
    @classmethod
    def from_crawler(cls, crawler):
        # This method is used by Scrapy to create your spiders.
        s = cls()
        crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
        return s
 
    def process_request(self, request, spider):
        # Called for each request that goes through the downloader
        # middleware.
 
        # Must either:
        # - return None: continue processing this request
        # - or return a Response object
        # - or return a Request object
        # - or raise IgnoreRequest: process_exception() methods of
        #   installed downloader middleware will be called
        return None
 
    def process_response(self, request, response, spider):
        # Called with the response returned from the downloader.
 
        # Must either;
        # - return a Response object
        # - return a Request object
        # - or raise IgnoreRequest
        return response
 
    def process_exception(self, request, exception, spider):
        # Called when a download handler or a process_request()
        # (from other downloader middleware) raises an exception.
 
        # Must either:
        # - return None: continue processing this exception
        # - return a Response object: stops process_exception() chain
        # - return a Request object: stops process_exception() chain
        pass
 
    def spider_opened(self, spider):
        spider.logger.info('Spider opened: %s' % spider.name)

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
honestman_ 2022-09-26 18:34
关注
光在这里发问题，问题解决了不给采纳，谁还会帮你回答呢

本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

报告相同问题？

关注问题

Python实践：将多张图片保存到一个文件夹
2024-09-13 00:39

孔乙己大叔的博客在数据科学和机器学习的项目中，图像处理是一个常见的任务。特别是在处理大量图像数据时，有效地管理和组织这些...本文将深入探讨如何使用Python将多张图片保存到一个文件夹中，同时涵盖一些相关的图像处理概念和技术。
Python游戏编程源码-python爬取图片源代码.zip
2025-01-22 20:03

首先，要实现这一功能，我们需要了解Python编程语言的基本语法和编程逻辑。Python作为一种高级编程语言，拥有简洁明了的代码风格，非常适合初学者学习。其庞大的社区支持和丰富的第三方库使得Python在数据处理、网络...
python使用openCV遍历文件夹里所有视频文件并保存成图片
2020-09-18 02:54

在Python编程中，有时我们需要处理大量的视频数据，例如将视频文件转换为静态图像。这篇内容主要讲解了如何使用OpenCV库来遍历指定文件夹及其子文件夹中的所有MP4视频文件，并将每一帧按一定间隔保存为图片。OpenCV...
Python挑选文件夹里宽大于300图片的方法
2020-09-22 09:21

在本文中，将详细解释如何使用Python语言及其PIL库（Pillow）来挑选文件夹内宽度大于300像素的图片，并将它们移动到另一个指定文件夹中。这个过程将涉及编程语言的基本操作，如导入库、路径操作、文件读写、条件判断...
脚本程序-python实现把图片转为jpg或png格式.zip
2025-04-29 10:36

最后是文件保存，将转换后的图片保存到指定的文件夹。转换图片的过程中，可能需要处理不同来源和不同格式的图片文件，这要求脚本有一定的兼容性和健壮性。比如，一些古老或特殊的图片格式可能需要特定的库才能读取...
python+按日期生成文件夹+保存图片
2023-08-24 08:35

在Python编程中，按日期生成文件夹并保存图片是一项常见的任务，特别是在日志记录、数据存储或自动化过程中。这个过程涉及到文件系统操作和日期时间处理。以下是一个详细的讲解，包括如何实现这一功能。首先，我们...
opencv-python学习代码
2022-05-16 11:03

Python是一种高级编程语言，以其简洁明了的语法和强大的库支持而受到开发者喜爱，尤其适合数据科学和机器学习项目。OpenCV-Python是OpenCV库的Python接口，使得Python开发者能够方便地利用OpenCV的功能。 “opencv...
汽车之家车型图片-python爬虫
2022-05-04 23:37

在本项目中，"汽车之家车型图片-python爬虫"是一个使用Python编程语言编写的脚本，目的是抓取汽车之家网站上的特定车型的所有图片，并将这些图片保存到本地，以供用户在选择车辆时参考。该项目在2022年5月进行了更新...
Python项目-自动办公-34 Python批量新建文件夹并保存日志信息.zip
2024-10-09 18:13

本项目以Python编程为基础，通过批量创建文件夹和日志记录的功能，为用户提供了高效便捷的自动化办公解决方案。项目不仅适用于需要大量文件管理的办公场景，也为编程新手提供了实践学习的机会，是一套集实用性和教育...
Python-用来操作含有图片的文件夹
2019-08-10 03:09

在IT行业中，Python是一种广泛应用的编程语言，尤其在文件操作方面表现出色。在这个特定的场景中，"Python-用来操作含有图片的文件夹"是指利用Python进行图像文件管理，包括读取、处理、移动、复制或者批量操作图像...
没有解决我的问题, 去提问

问题事件

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
已结题（查看结题原因） 9月27日
关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
已采纳回答 9月27日
关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
创建了问题 9月26日

怎么把图片保存到文件夹里(语言-python)

2条回答 默认 最新

问题事件

2条回答默认最新