qq_41867864 2020-04-16 11:07 采纳率: 0%
浏览 29
已结题

scrapy和redis不能爬到数据数据 【赏金可以提升】

分布式爬虫一直都是显示Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)

github的原地址是https://github.com/CUHKSZ-TQL/WeiboSpider_SentimentAnalysis

配置环境之后对代码修改之后是

链接:https://pan.baidu.com/s/1jHbz7ak8VqO-MMHeGj9_UA 

提取码:iecl

运行第三个程序的结果是:

= RESTART: C:\Users\ap645\Desktop\WeiboSpider_SentimentAnalysis-master\WeiboSpider\sina\spiders\weibo_spider.py

2020-04-16 11:04:10 [scrapy.utils.log] INFO: Scrapy 2.0.1 started (bot: sina)

2020-04-16 11:04:10 [scrapy.utils.log] INFO: Versions: lxml 4.5.0.0, libxml2 2.9.5, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 20.3.0, Python 3.8.1 (tags/v3.8.1:1b293b6, Dec 18 2019, 23:11:46) [MSC v.1916 64 bit (AMD64)], pyOpenSSL 19.1.0 (OpenSSL 1.1.1f 31 Mar 2020), cryptography 2.9, Platform Windows-10-10.0.18362-SP0

2020-04-16 11:04:10 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor

2020-04-16 11:04:10 [scrapy.crawler] INFO: Overridden settings:

{'BOT_NAME': 'sina',

 'DOWNLOAD_DELAY': 2,

 'DUPEFILTER_CLASS': 'scrapy_redis_bloomfilter.dupefilter.RFPDupeFilter',

 'NEWSPIDER_MODULE': 'sina.spiders',

 'SCHEDULER': 'scrapy_redis_bloomfilter.scheduler.Scheduler',

 'SPIDER_MODULES': ['sina.spiders']}

2020-04-16 11:04:10 [scrapy.extensions.telnet] INFO: Telnet Password: 3c9f648b6ca7a947

2020-04-16 11:04:10 [scrapy.middleware] INFO: Enabled extensions:

['scrapy.extensions.corestats.CoreStats',

 'scrapy.extensions.telnet.TelnetConsole',

 'scrapy.extensions.logstats.LogStats']

2020-04-16 11:04:10 [weibo_spider] INFO: Reading start URLs from redis key 'weibo_spider:start_urls' (batch size: 16, encoding: utf-8

2020-04-16 11:04:12 [scrapy.middleware] INFO: Enabled downloader middlewares:

['sina.middlewares.RedirectMiddleware',

 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',

 'sina.middlewares.CookieMiddleware',

 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',

 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',

 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',

 'scrapy.downloadermiddlewares.retry.RetryMiddleware',

 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',

 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',

 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',

 'scrapy.downloadermiddlewares.stats.DownloaderStats']

2020-04-16 11:04:12 [scrapy.middleware] INFO: Enabled spider middlewares:

['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',

 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',

 'scrapy.spidermiddlewares.referer.RefererMiddleware',

 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',

 'scrapy.spidermiddlewares.depth.DepthMiddleware']

2020-04-16 11:04:12 [scrapy.middleware] INFO: Enabled item pipelines:

['sina.pipelines.MongoDBPipeline']

2020-04-16 11:04:12 [scrapy.core.engine] INFO: Spider opened

2020-04-16 11:04:12 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)

2020-04-16 11:04:12 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023

2020-04-16 11:05:12 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)


  • 写回答

6条回答 默认 最新

  • 考古学家lx(李玺) python领域优质创作者 2020-04-16 19:39
    关注

    INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)

    挂机了 ,设置请求超时时间,请求间隔,重试次数,检查ua、代理


    评论

报告相同问题?

悬赏问题

  • ¥15 素材场景中光线烘焙后灯光失效
  • ¥15 请教一下各位,为什么我这个没有实现模拟点击
  • ¥15 执行 virtuoso 命令后,界面没有,cadence 启动不起来
  • ¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
  • ¥20 有关区间dp的问题求解
  • ¥15 多电路系统共用电源的串扰问题
  • ¥15 slam rangenet++配置
  • ¥15 有没有研究水声通信方面的帮我改俩matlab代码
  • ¥15 ubuntu子系统密码忘记
  • ¥15 保护模式-系统加载-段寄存器