scrapy中Spider中的变量如何传递给Middleware中的request中 5C

在获取了response响应中的内容后,需要将response的部分内容更新到cookie中。
但是获取response的内容实在自定义的parse函数中,而更新cookie是在Middleware中的process_request()中,那如何将Spider中的parse函数中的变量传递到Middleware中的process_request中呢?
下边是我的函数

图片说明

以上还请大神指点一下~~

1个回答

Csdn user default icon
上传中...
上传图片
插入图片
抄袭、复制答案,以达到刷声望分或其他目的的行为,在CSDN问答是严格禁止的,一经发现立刻封号。是时候展现真正的技术了!
其他相关推荐
利用Scrapy框架爬虫时出现报错ModuleNotFoundError: No module named 'scrapytest.NewsItems'?
``` #引入文件 import scrapy class MySpider(scrapy.Spider): #用于区别Spider name = "MySpider" #允许访问的域 allowed_domains = [] #爬取的地址 start_urls = [] #爬取方法 def parse(self, response): pass class NewsItem(scrapy.Item): #新闻标题 title = scrapy.Field() #新闻url url = scrapy.Field() #发布时间 time = scrapy.Field() #新闻内容 introduction = scrapy.Field() #定义一个item news = NewsItem() #赋值 news['title'] = "第六届年会在我校成功举办" #取值 news['title'] news.get('title') #获取全部键 news.keys() #获取全部值 news.items() import scrapy #引入容器 from scrapytest.NewsItems import NewsItem class MySpider(scrapy.Spider): #设置name name = "MySpider" #设定域名 allowed_domains = ["xgxy.hbue.edu.cn"] #填写爬取地址 start_urls = ["http://xgxy.hbue.edu.cn/2627/list.htm"] #编写爬取方法 def parse(self, response): #实例一个容器保存爬取的信息 item = NewsItem() ``` 显示错误为: ModuleNotFoundError Traceback (most recent call last) <ipython-input-17-17f981d92f22> in <module> 1 import scrapy 2 #引入容器 ----> 3 from scrapytest.NewsItems import NewsItem 4 5 class MySpider(scrapy.Spider): ModuleNotFoundError: No module named 'scrapytest.NewsItems' 希望大佬帮忙看一下,出了什么问题,万分感谢!
来个大佬教下小白scrapy怎么创建多个spider
第一个spider可以运行,第二个spider不知道怎么写了,需要怎么创建并修改哪些代码,小白初学scrapy,劳烦哪位大佬能够详细解答
scrapy爬取过程中出现重复的
# -*- coding: utf-8 -*- import scrapy class JobSpider(scrapy.Spider): name = 'job' allowed_domains = ['guazi.com'] start_urls = ['https://www.guazi.com/hz/buy/'] def parse(self, response): car_list=response.xpath('/html/body/div[6]/ul/li/a') # print(car_list) for num,i in enumerate(car_list): item={} item['name']=i.xpath('/html/body/div[6]/ul/li/a/h2/text()').extract()[num] #可以提取不同的 print(item) item['link']=i.xpath('/html/body/div[6]/ul[1]/li/a/@href').extract_first()提取的全是重复的
scrapy的变量名是规定好的吗?
我有一个疑问,就是在某些框架中 变量名好像是规定好的,比如在scrapy中蜘蛛的名字name,作用域处理函数parse(self,response)貌似response也是规定好的。。。我想请问下这其中的实现原理是什么?如果我想将name,换成其他的名字可以吗?可以的话该怎么做。还有我一直很好奇,为什么各个部件间没用显示的调用语句,数据会自动流动并处理
scrapy爬虫,按照教程,为什么没有生成对应的html文件?
教程地址:https://www.runoob.com/w3cnote/scrapy-detail.html ``` # -*- coding: utf-8 -*- import scrapy class ItcastSpider(scrapy.Spider): name = 'itcast' allowed_domains = ['itcast.cn'] start_urls = ['http://itcast.cn/'] def parse(self, response): filename = "teacher.html" open(filename,"w").write(response.body) ``` scrapy crawl itcast 之后文件夹下什么都没有生成,也没有报错, 我用的python版本为3.7
scrapy 如何处理请求与请求之间的依赖关系
众所周知,scrapy是基于twisted的爬虫框架,scrapy控制器将spiders中的所有请求都yield到调度器的请求队列,所以整个项目的所有请求并非按照我们代码写的顺序去依次请求对应URL,但实际上,有很多网页的翻页是需要带上上一页的参数才能正常返回下一页的数据的,也就是说请求必须是按照一定的规则(页码顺序等)才能获得正确的响应数据。基于这个前提,请问scrapy框架如何应对呢?
scrapy能够实现先登录再抓取吗
想用python中的scrapy框架抓取网页,但是需要先登录才能显示抓取内容,登录即为一个post操作,但是scrapy中直接通过spider模块的start_url中的url在调度器中生成request,如果需添加post参数是在调试器里添加吗,另外在哪里可以打开并编辑调试器代码? 求用过scrapy的高手解答?_
在scrapy中如何实现在多个页面上进行爬取
目标是 爬取http://download.kaoyan.com/list-1到http://download.kaoyan.com/list-1500之间的内容,每个页面中的又有翻页的list-1p1到list-1p20。目前只能实现在list1p上面爬取,应该如何修改代码跳转到list-6上面?list-2是404 ``` # -*- coding: utf-8 -*- import scrapy from Kaoyan.items import KaoyanItem class KaoyanbangSpider(scrapy.Spider): name = "Kaoyanbang" allowed_domains = ['kaoyan.com'] baseurl = 'http://download.kaoyan.com/list-' linkuseurl = 'http://download.kaoyan.com' offset = 1 pset = 1 start_urls = [baseurl+str(offset)+'p'+str(pset)] handle_httpstatus_list = [404, 500] def parse(self, response): node_list = response.xpath('//table/tr/th/span/a') for node in node_list: item = KaoyanItem() item['name'] = node.xpath('./text()').extract()[0].encode('utf - 8') item['link'] = (self.linkuseurl + node.xpath('./@href').extract()[0]).encode('utf-8') yield item while self.offset < 1500: while self.pset < 50: self.pset = self.pset + 1 url = self.baseurl+str(self.offset)+'p'+str(self.pset) y = scrapy.Request(url, callback=self.parse) yield y self.offset = self.offset + 5 ```
由scrapy异步,导致cookie在scrapy中储存过久,cookie值失效,该怎么办?
由scrapy异步,导致cookie在scrapy中储存过久,cookie值失效,该怎么办?
scrapy代码运行成功却没有保存到文件中
#**_ scrapy代码运行成功却没有保存到文件中_** **代码**: ![图片说明](https://img-ask.csdn.net/upload/201708/21/1503313313_722770.png) ``` #encoding:utf-8 #!/user/bin/python from scrapy.spider import Spider from scrapy.selector import Selector class DmozSpider(Spider): name = "dmoz" allowed_domains = ["dmoz.org"] start_urls = [ "http://tieba.baidu.com/f?kw=python3&ie=utf-8&pn=50", #"http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/" ] def parse(self, response): sel = Selector(response) sites = sel.xpath('.//a[@title]/text()').extract() for i in sites: print ('提示 这是title 提示 这是title 提示 这是title',i) yield items ``` 上面的代码出现了如下错误 ![图片说明](https://img-ask.csdn.net/upload/201708/21/1503313549_239409.png) 把 ``` yield items ``` 改为 yield sites 时,没有报错 但是 123.json文件是空的
scrapy - 怎么让scrapy框架产生的日志输出中文
我自已写的日志,中文输出正常,scrapy框架自动生成的日志记录,中文输出是一串字符串,怎么输出为中文? ![图片说明](https://img-ask.csdn.net/upload/201803/07/1520384186_193793.png)
【帮帮孩子】scrapy框架请问如何在parse函数中调用已有的参数来构造post请求获得回传的数据包呀
刚接触scrapy框架一周的菜鸟,之前都没用过框架手撸爬虫的,这次遇到了一个问题,我先请求一个网页 ``` def start_requests(self): urls=["http://www.tiku.cn/index/index/questions?cid=14&cno=1&unitid=800417&chapterid=701354&typeid=600122&thrknowid=700137"] for url in urls: yield scrapy.Request(url=url,callback=self.parse) ``` 然后传给parse方法获得了question_ID这个关键参数,然后我想在这里面直接利用这个question_id这个参数构造post请求获得它回传的json数据包并保存在 item['正确答案']之中,请问我要如何实现?,谢谢大佬百忙之中抽空回答我的疑问,谢谢! ``` def parse(self, response): item = TikuItem () for i in range(1,11): QUESTION_ID=str(response.xpath('(/html/body/div[4]/div[2]/div[2]/div['+str(i)+']/div[@class="q-analysis text-l"]/@id)').extract_first()[3:]) item['question_ID']=QUESTION_ID ``` 这是我的items.py文件 ``` class TikuItem(scrapy.Item): # define the fields for your item here like: # name = scrapy.Field() question_ID=scrapy.Field()#题号 correct_answer=scrapy.Field()#正确答案 ```
scrapy 在重定向的时候会丢失middlewares中设置的header吗?
问题是,在使用某动态转发的代理时,客服回复:“是因为请求需要重定向的url但是本身用的包使用代理自动重定向请求的时候会丢失hearder,这个时候就需要用户,禁止重定向,然后根据返回的状态码301/302的时候,从响应头的Location中获取新的请求url” 想问下scrapy 在重定向的时候会丢失middlewares中设置的header吗?如果是的话,怎么设置不“丢失”呢? 因为scrapy都是通过yield Request来请求的,在这里也没法判断状态码和获取重定向之后的URL吧?
请问scrapy为什么会爬取失败
C:\Users\Administrator\Desktop\新建文件夹\xiaozhu>python -m scrapy crawl xiaozhu 2019-10-26 11:43:11 [scrapy.utils.log] INFO: Scrapy 1.7.3 started (bot: xiaozhu) 2019-10-26 11:43:11 [scrapy.utils.log] INFO: Versions: lxml 4.4.1.0, libxml2 2.9 .5, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 19.7.0, Python 3.5.3 (v 3.5.3:1880cb95a742, Jan 16 2017, 15:51:26) [MSC v.1900 32 bit (Intel)], pyOpenSS L 19.0.0 (OpenSSL 1.1.1c 28 May 2019), cryptography 2.7, Platform Windows-7-6.1 .7601-SP1 2019-10-26 11:43:11 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'xi aozhu', 'SPIDER_MODULES': ['xiaozhu.spiders'], 'NEWSPIDER_MODULE': 'xiaozhu.spid ers'} 2019-10-26 11:43:11 [scrapy.extensions.telnet] INFO: Telnet Password: c61bda45d6 3b8138 2019-10-26 11:43:11 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.logstats.LogStats'] 2019-10-26 11:43:12 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2019-10-26 11:43:12 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2019-10-26 11:43:12 [scrapy.middleware] INFO: Enabled item pipelines: [] 2019-10-26 11:43:12 [scrapy.core.engine] INFO: Spider opened 2019-10-26 11:43:12 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pag es/min), scraped 0 items (at 0 items/min) 2019-10-26 11:43:12 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 2019-10-26 11:43:12 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting ( 307) to <GET https://bizverify.xiaozhu.com?slideRedirect=https%3A%2F%2Fbj.xiaozh u.com%2Ffangzi%2F125535477903.html> from <GET http://bj.xiaozhu.com/fangzi/12553 5477903.html> 2019-10-26 11:43:12 [scrapy.core.engine] DEBUG: Crawled (400) <GET https://bizve rify.xiaozhu.com?slideRedirect=https%3A%2F%2Fbj.xiaozhu.com%2Ffangzi%2F125535477 903.html> (referer: None) 2019-10-26 11:43:12 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <400 https://bizverify.xiaozhu.com?slideRedirect=https%3A%2F%2Fbj.xiaozhu.com%2 Ffangzi%2F125535477903.html>: HTTP status code is not handled or not allowed 2019-10-26 11:43:12 [scrapy.core.engine] INFO: Closing spider (finished) 2019-10-26 11:43:12 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 529, 'downloader/request_count': 2, 'downloader/request_method_count/GET': 2, 'downloader/response_bytes': 725, 'downloader/response_count': 2, 'downloader/response_status_count/307': 1, 'downloader/response_status_count/400': 1, 'elapsed_time_seconds': 0.427734, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2019, 10, 26, 3, 43, 12, 889648), 'httperror/response_ignored_count': 1, 'httperror/response_ignored_status_count/400': 1, 'log_count/DEBUG': 2, 'log_count/INFO': 11, 'response_received_count': 1, 'scheduler/dequeued': 2, 'scheduler/dequeued/memory': 2, 'scheduler/enqueued': 2, 'scheduler/enqueued/memory': 2, 'start_time': datetime.datetime(2019, 10, 26, 3, 43, 12, 461914)} 2019-10-26 11:43:12 [scrapy.core.engine] INFO: Spider closed (finished)
scrapy 一个项目 多爬虫 如何分别用自身的爬虫名 命名产生的log文件?
在一个scrapy项目中,我写了几个爬虫,不管是同时运行还是分别顺序运行, 我想每个爬虫分别产生独立的log文件,要有分辨性,就想用爬虫名来作为 log文件名。如果不自己写log设置模块,在settings里能实现吗?谢谢各位 大佬解答一下。
一个百度拇指医生爬虫,想要先实现爬取某个问题的所有链接,但是爬不出来东西。求各位大神帮忙看一下这是为什么?
#写在前面的话 在这个爬虫里我想实现把百度拇指医生里关于“咳嗽”的链接全部爬取下来,下一步要进行的是把爬取到的每个链接里的items里面的内容爬取下来,但是我在第一步就卡住了,求各位大神帮我看一下吧。之前刚刚发了一篇问答,但是不知道怎么回事儿,现在找不到了,(貌似是被删了...?)救救小白吧!感激不尽! 这个是我的爬虫的结构 ![图片说明](https://img-ask.csdn.net/upload/201911/27/1574787999_274479.png) ##ks: ``` # -*- coding: utf-8 -*- import scrapy from kesou.items import KesouItem from scrapy.selector import Selector from scrapy.spiders import Spider from scrapy.http import Request ,FormRequest import pymongo class KsSpider(scrapy.Spider): name = 'ks' allowed_domains = ['kesou,baidu.com'] start_urls = ['https://www.baidu.com/s?wd=%E5%92%B3%E5%97%BD&pn=0&oq=%E5%92%B3%E5%97%BD&ct=2097152&ie=utf-8&si=muzhi.baidu.com&rsv_pq=980e0c55000e2402&rsv_t=ed3f0i5yeefxTMskgzim00cCUyVujMRnw0Vs4o1%2Bo%2Bohf9rFXJvk%2FSYX%2B1M'] def parse(self, response): item = KesouItem() contents = response.xpath('.//h3[@class="t"]') for content in contents: url = content.xpath('.//a/@href').extract()[0] item['url'] = url yield item if self.offset < 760: self.offset += 10 yield scrapy.Request(url = "https://www.baidu.com/s?wd=%E5%92%B3%E5%97%BD&pn=" + str(self.offset) + "&oq=%E5%92%B3%E5%97%BD&ct=2097152&ie=utf-8&si=muzhi.baidu.com&rsv_pq=980e0c55000e2402&rsv_t=ed3f0i5yeefxTMskgzim00cCUyVujMRnw0Vs4o1%2Bo%2Bohf9rFXJvk%2FSYX%2B1M",callback=self.parse,dont_filter=True) ``` ##items: ``` # -*- coding: utf-8 -*- # Define here the models for your scraped items # # See documentation in: # https://docs.scrapy.org/en/latest/topics/items.html import scrapy class KesouItem(scrapy.Item): # 问题ID question_ID = scrapy.Field() # 问题描述 question = scrapy.Field() # 医生回答发表时间 answer_pubtime = scrapy.Field() # 问题详情 description = scrapy.Field() # 医生姓名 doctor_name = scrapy.Field() # 医生职位 doctor_title = scrapy.Field() # 医生所在医院 hospital = scrapy.Field() ``` ##middlewares: ``` # -*- coding: utf-8 -*- # Define here the models for your spider middleware # # See documentation in: # https://docs.scrapy.org/en/latest/topics/spider-middleware.html from scrapy import signals class KesouSpiderMiddleware(object): # Not all methods need to be defined. If a method is not defined, # scrapy acts as if the spider middleware does not modify the # passed objects. @classmethod def from_crawler(cls, crawler): # This method is used by Scrapy to create your spiders. s = cls() crawler.signals.connect(s.spider_opened, signal=signals.spider_opened) return s def process_spider_input(self, response, spider): # Called for each response that goes through the spider # middleware and into the spider. # Should return None or raise an exception. return None def process_spider_output(self, response, result, spider): # Called with the results returned from the Spider, after # it has processed the response. # Must return an iterable of Request, dict or Item objects. for i in result: yield i def process_spider_exception(self, response, exception, spider): # Called when a spider or process_spider_input() method # (from other spider middleware) raises an exception. # Should return either None or an iterable of Request, dict # or Item objects. pass def process_start_requests(self, start_requests, spider): # Called with the start requests of the spider, and works # similarly to the process_spider_output() method, except # that it doesn’t have a response associated. # Must return only requests (not items). for r in start_requests: yield r def spider_opened(self, spider): spider.logger.info('Spider opened: %s' % spider.name) class KesouDownloaderMiddleware(object): # Not all methods need to be defined. If a method is not defined, # scrapy acts as if the downloader middleware does not modify the # passed objects. @classmethod def from_crawler(cls, crawler): # This method is used by Scrapy to create your spiders. s = cls() crawler.signals.connect(s.spider_opened, signal=signals.spider_opened) return s def process_request(self, request, spider): # Called for each request that goes through the downloader # middleware. # Must either: # - return None: continue processing this request # - or return a Response object # - or return a Request object # - or raise IgnoreRequest: process_exception() methods of # installed downloader middleware will be called return None def process_response(self, request, response, spider): # Called with the response returned from the downloader. # Must either; # - return a Response object # - return a Request object # - or raise IgnoreRequest return response def process_exception(self, request, exception, spider): # Called when a download handler or a process_request() # (from other downloader middleware) raises an exception. # Must either: # - return None: continue processing this exception # - return a Response object: stops process_exception() chain # - return a Request object: stops process_exception() chain pass def spider_opened(self, spider): spider.logger.info('Spider opened: %s' % spider.name) ``` ##piplines: ``` # -*- coding: utf-8 -*- # Define your item pipelines here # # Don't forget to add your pipeline to the ITEM_PIPELINES setting # See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html import pymongo from scrapy.utils.project import get_project_settings settings = get_project_settings() class KesouPipeline(object): def __init__(self): host = settings["MONGODB_HOST"] port = settings["MONGODB_PORT"] dbname = settings["MONGODB_DBNAME"] sheetname= settings["MONGODB_SHEETNAME"] # 创建MONGODB数据库链接 client = pymongo.MongoClient(host = host, port = port) # 指定数据库 mydb = client[dbname] # 存放数据的数据库表名 self.sheet = mydb[sheetname] def process_item(self, item, spider): data = dict(item) self.sheet.insert(data) return item ``` ##settings: ``` # -*- coding: utf-8 -*- # Scrapy settings for kesou project # # For simplicity, this file contains only settings considered important or # commonly used. You can find more settings consulting the documentation: # # https://docs.scrapy.org/en/latest/topics/settings.html # https://docs.scrapy.org/en/latest/topics/downloader-middleware.html # https://docs.scrapy.org/en/latest/topics/spider-middleware.html BOT_NAME = 'kesou' SPIDER_MODULES = ['kesou.spiders'] NEWSPIDER_MODULE = 'kesou.spiders' # Crawl responsibly by identifying yourself (and your website) on the user-agent #USER_AGENT = 'kesou (+http://www.yourdomain.com)' # Obey robots.txt rules ROBOTSTXT_OBEY = False # Configure maximum concurrent requests performed by Scrapy (default: 16) #CONCURRENT_REQUESTS = 32 # Configure a delay for requests for the same website (default: 0) # See https://docs.scrapy.org/en/latest/topics/settings.html#download-delay # See also autothrottle settings and docs #DOWNLOAD_DELAY = 3 # The download delay setting will honor only one of: #CONCURRENT_REQUESTS_PER_DOMAIN = 16 #CONCURRENT_REQUESTS_PER_IP = 16 # Disable cookies (enabled by default) COOKIES_ENABLED = False # Disable Telnet Console (enabled by default) #TELNETCONSOLE_ENABLED = False USER_AGENT="Mozilla/5.0 (Windows NT 10.0; WOW64; rv:67.0) Gecko/20100101 Firefox/67.0" # Override the default request headers: #DEFAULT_REQUEST_HEADERS = { # 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', # 'Accept-Language': 'en', #} # Enable or disable spider middlewares # See https://docs.scrapy.org/en/latest/topics/spider-middleware.html #SPIDER_MIDDLEWARES = { # 'kesou.middlewares.KesouSpiderMiddleware': 543, #} # Enable or disable downloader middlewares # See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html #DOWNLOADER_MIDDLEWARES = { # 'kesou.middlewares.KesouDownloaderMiddleware': 543, #} # Enable or disable extensions # See https://docs.scrapy.org/en/latest/topics/extensions.html #EXTENSIONS = { # 'scrapy.extensions.telnet.TelnetConsole': None, #} # Configure item pipelines # See https://docs.scrapy.org/en/latest/topics/item-pipeline.html ITEM_PIPELINES = { 'kesou.pipelines.KesouPipeline': 300, } # MONGODB 主机名 MONGODB_HOST = "127.0.0.1" # MONGODB 端口号 MONGODB_PORT = 27017 # 数据库名称 MONGODB_DBNAME = "ks" # 存放数据的表名称 MONGODB_SHEETNAME = "ks_urls" # Enable and configure the AutoThrottle extension (disabled by default) # See https://docs.scrapy.org/en/latest/topics/autothrottle.html #AUTOTHROTTLE_ENABLED = True # The initial download delay #AUTOTHROTTLE_START_DELAY = 5 # The maximum download delay to be set in case of high latencies #AUTOTHROTTLE_MAX_DELAY = 60 # The average number of requests Scrapy should be sending in parallel to # each remote server #AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0 # Enable showing throttling stats for every response received: #AUTOTHROTTLE_DEBUG = False # Enable and configure HTTP caching (disabled by default) # See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings #HTTPCACHE_ENABLED = True #HTTPCACHE_EXPIRATION_SECS = 0 #HTTPCACHE_DIR = 'httpcache' #HTTPCACHE_IGNORE_HTTP_CODES = [] #HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage' ``` ##run.py: ``` # -*- coding: utf-8 -*- from scrapy import cmdline cmdline.execute("scrapy crawl ks".split()) ``` ##这个是运行出来的结果: ``` PS D:\scrapy_project\kesou> scrapy crawl ks 2019-11-27 00:14:17 [scrapy.utils.log] INFO: Scrapy 1.7.3 started (bot: kesou) 2019-11-27 00:14:17 [scrapy.utils.log] INFO: Versions: lxml 4.3.2.0, libxml2 2.9.9, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twis.7.0, Python 3.7.3 (default, Mar 27 2019, 17:13:21) [MSC v.1915 64 bit (AMD64)], pyOpenSSL 19.0.0 (OpenSSL 1.1.1b 26 Feb 2019), cryphy 2.6.1, Platform Windows-10-10.0.18362-SP0 2019-11-27 00:14:17 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'kesou', 'COOKIES_ENABLED': False, 'NEWSPIDER_MODULE': 'spiders', 'SPIDER_MODULES': ['kesou.spiders'], 'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:67.0) Gecko/20100101 Firefox/67 2019-11-27 00:14:17 [scrapy.extensions.telnet] INFO: Telnet Password: 051629c46f34abdf 2019-11-27 00:14:17 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.logstats.LogStats'] 2019-11-27 00:14:19 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2019-11-27 00:14:19 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2019-11-27 00:14:19 [scrapy.middleware] INFO: Enabled item pipelines: ['kesou.pipelines.KesouPipeline'] 2019-11-27 00:14:19 [scrapy.core.engine] INFO: Spider opened 2019-11-27 00:14:19 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2019-11-27 00:14:19 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 2019-11-27 00:14:20 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.baidu.com/s?wd=%E5%92%B3%E5%97%BD&pn=0&oq=%E5%92%B3%E5&ct=2097152&ie=utf-8&si=muzhi.baidu.com&rsv_pq=980e0c55000e2402&rsv_t=ed3f0i5yeefxTMskgzim00cCUyVujMRnw0Vs4o1%2Bo%2Bohf9rFXJvk%2FSYX% (referer: None) 2019-11-27 00:14:20 [scrapy.core.scraper] ERROR: Spider error processing <GET https://www.baidu.com/s?wd=%E5%92%B3%E5%97%BD&pn=0&oq=%B3%E5%97%BD&ct=2097152&ie=utf-8&si=muzhi.baidu.com&rsv_pq=980e0c55000e2402&rsv_t=ed3f0i5yeefxTMskgzim00cCUyVujMRnw0Vs4o1%2Bo%2Bohf9rFFSYX%2B1M> (referer: None) Traceback (most recent call last): File "d:\anaconda3\lib\site-packages\scrapy\utils\defer.py", line 102, in iter_errback yield next(it) File "d:\anaconda3\lib\site-packages\scrapy\core\spidermw.py", line 84, in evaluate_iterable for r in iterable: File "d:\anaconda3\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 29, in process_spider_output for x in result: File "d:\anaconda3\lib\site-packages\scrapy\core\spidermw.py", line 84, in evaluate_iterable for r in iterable: File "d:\anaconda3\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 339, in <genexpr> return (_set_referer(r) for r in result or ()) File "d:\anaconda3\lib\site-packages\scrapy\core\spidermw.py", line 84, in evaluate_iterable for r in iterable: File "d:\anaconda3\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 37, in <genexpr> return (r for r in result or () if _filter(r)) File "d:\anaconda3\lib\site-packages\scrapy\core\spidermw.py", line 84, in evaluate_iterable for r in iterable: File "d:\anaconda3\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 58, in <genexpr> return (r for r in result or () if _filter(r)) File "D:\scrapy_project\kesou\kesou\spiders\ks.py", line 19, in parse item['url'] = url File "d:\anaconda3\lib\site-packages\scrapy\item.py", line 73, in __setitem__ (self.__class__.__name__, key)) KeyError: 'KesouItem does not support field: url' 2019-11-27 00:14:20 [scrapy.core.engine] INFO: Closing spider (finished) 2019-11-27 00:14:20 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 438, 'downloader/request_count': 1, 'downloader/request_method_count/GET': 1, 'downloader/response_bytes': 68368, 'downloader/response_count': 1, 'downloader/response_status_count/200': 1, 'elapsed_time_seconds': 0.992207, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2019, 11, 26, 16, 14, 20, 855804), 'log_count/DEBUG': 1, 2019-11-27 00:14:20 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 438, 'downloader/request_count': 1, 'downloader/request_method_count/GET': 1, 'downloader/response_bytes': 68368, 'downloader/response_count': 1, 'downloader/response_status_count/200': 1, 'elapsed_time_seconds': 0.992207, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2019, 11, 26, 16, 14, 20, 855804), 'log_count/DEBUG': 1, 'log_count/ERROR': 1, 'log_count/INFO': 10, 'response_received_count': 1, 'scheduler/dequeued': 1, 'scheduler/dequeued/memory': 1, 'scheduler/enqueued': 1, 'scheduler/enqueued/memory': 1, 'spider_exceptions/KeyError': 1, 'start_time': datetime.datetime(2019, 11, 26, 16, 14, 19, 863597)} 2019-11-27 00:14:21 [scrapy.core.engine] INFO: Spider closed (finished) ```
scrapy出错twisted.python.failure.Failure twisted.internet.error
我是跟这网上视频写的 ``` import scrapy class QsbkSpider(scrapy.Spider): name = 'qsbk' allowed_domains = ['www.qiushibaike.com/'] start_urls = ['https://www.qiushibaike.com/text/'] def parse(self, response): print('='*10) print(response) print('*'*10) ``` 出现 ![图片说明](https://img-ask.csdn.net/upload/201908/06/1565066344_519008.png) 请求头改了还是不行,用requests库爬取又可以
scrapy-redis 多个spider监听redis,只有一个能监听到
# 问题描述 >使用scrapy-redis进行分布式抓取时,遇到了一个很奇怪的问题,有俩台机器,Windows电脑A,和Ubuntu电脑B,redis server部署在 Windows电脑A上,在电脑A,B启动爬虫后,俩只爬虫都进入监听状态,在redis中进行 url的lpush操作,奇怪的事情发生了,电脑A,或者电脑B中只有一台电脑能监听到 redis,但是具体哪个能够监听到这个很随机,有时是电脑A,有时是电脑B,我能确保,电脑A和B都是连接上了 redis的 ## 运行环境 • scrapy 1.5.0 • scrapy-redis 0.6.8 • redis.py 2.10.6 • redis-server(windows x64) 3.2.100 ### 运行截图 #### 分别启动俩个spider ![图片说明](https://img-ask.csdn.net/upload/201801/21/1516546011_610430.jpg) #### 第一次进行url的lpush操作,结果如下 ![图片说明](https://img-ask.csdn.net/upload/201801/21/1516546036_920518.jpg) >这时只有爬虫A监听到了 redis的操作,执行抓取逻辑,而爬虫B仍然处于监听状态 ###手动停止俩只spider,并清空redis数据,再次开启爬虫,执行lpush操作,结果如下 ![图片说明](https://img-ask.csdn.net/upload/201801/21/1516546092_288308.jpg) >这时,爬虫B却监听到了redis的操作,执行抓取逻辑,而爬虫A仍处于监听状态 还有一张是lpush后 redis中的数据情况 ![图片说明](https://img-ask.csdn.net/upload/201801/21/1516546114_982429.png) 被这个问题困扰了2天了,这俩天一直没怎么睡,查了好多资料,试了好多办法,都不行,中间我把redis服务放在了电脑c上,但是还行不行。 希望前辈们,能指点一二 问题解决后请 采纳答案 进行终结;如果自己找到解决方案,你可以 自问自答 并采纳。
scrapy运行爬虫时报错Missing scheme in request url
scrapy刚入门小白一枚。用网上的案例代码来玩一玩,案例是http://blog.csdn.net/czl389/article/details/77278166 中的爬取嘻哈歌词。这个案例下有三只爬虫,分别是songurls,lyrics和songinfo。我用songurls爬虫能从虾米音乐上爬取了url并保存在SongUrls.csv中,但是在用lyrics爬虫的时候会报错。信息如下 **D:\xiami2\xiami2>scrapy crawl lyrics -o Lyrics.csv 2017-10-21 21:13:29 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: xiami2) 2017-10-21 21:13:29 [scrapy.utils.log] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'xiami2.spiders', 'USER_AGENT': 'Mozilla/5.0 (compatible; MSIE 6.0; Windows NT 4.0; Trident/3.0)', 'FEED_URI': 'Lyrics.csv', 'FEED_FORMAT': 'csv', 'DOWNLOAD_DELAY': 0.2, 'SPIDER_MODULES': ['xiami2.spiders'], 'BOT_NAME': 'xiami2'} 2017-10-21 21:13:29 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.feedexport.FeedExporter', 'scrapy.extensions.logstats.LogStats'] 2017-10-21 21:13:31 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2017-10-21 21:13:31 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2017-10-21 21:13:31 [scrapy.middleware] INFO: Enabled item pipelines: ['xiami2.pipelines.Xiami2Pipeline'] 2017-10-21 21:13:31 [scrapy.core.engine] INFO: Spider opened 2017-10-21 21:13:31 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2017-10-21 21:13:31 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023 2017-10-21 21:13:31 [scrapy.core.engine] ERROR: Error while obtaining start requests Traceback (most recent call last): File "d:\python3.5\lib\site-packages\scrapy\core\engine.py", line 127, in _next_request request = next(slot.start_requests) File "d:\python3.5\lib\site-packages\scrapy\spiders\__init__.py", line 83, in start_requests yield Request(url, dont_filter=True) File "d:\python3.5\lib\site-packages\scrapy\http\request\__init__.py", line 25, in __init__ self._set_url(url) File "d:\python3.5\lib\site-packages\scrapy\http\request\__init__.py", line 58, in _set_url raise ValueError('Missing scheme in request url: %s' % self._url) ValueError: Missing scheme in request url: 2017-10-21 21:13:31 [scrapy.core.engine] INFO: Closing spider (finished) 2017-10-21 21:13:31 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'finish_reason': 'finished', 'finish_time': datetime.datetime(2017, 10, 21, 13, 13, 31, 567323), 'log_count/DEBUG': 1, 'log_count/ERROR': 1, 'log_count/INFO': 7, 'start_time': datetime.datetime(2017, 10, 21, 13, 13, 31, 536236)} 2017-10-21 21:13:31 [scrapy.core.engine] INFO: Spider closed (finished) _------------------------------分割线--------------------------------------_ 我去查看了一下_init_.py,发现如下语句。 if ':' not in self._url: raise ValueError('Missing scheme in request url: %s' % self._url) 网上的解决方法看了一些,都没有能解决我的问题的,因此在此讨教,望大家指点一二(真没C币了)。提问次数不多,若有格式方面缺陷还请包含。 另附上代码。 #songurls.py import scrapy import re from scrapy.spiders import CrawlSpider, Rule from ..items import SongUrlItem class SongurlsSpider(scrapy.Spider): name = 'songurls' allowed_domains = ['xiami.com'] #将page/1到page/401,这些链接放进start_urls start_url_list=[] url_fixed='http://www.xiami.com/song/tag/Hip-Hop/page/' #将range范围扩大为1-401,获得所有页面 for i in range(1,402): start_url_list.extend([url_fixed+str(i)]) start_urls=start_url_list def parse(self,response): urls=response.xpath('//*[@id="wrapper"]/div[2]/div/div/div[2]/table/tbody/tr/td[2]/a[1]/@href').extract() for url in urls: song_url=response.urljoin(url) url_item=SongUrlItem() url_item['song_url']=song_url yield url_item ------------------------------分割线-------------------------------------- #lyrics.py import scrapy import re class LyricsSpider(scrapy.Spider): name = 'lyrics' allowed_domains = ['xiami.com'] song_url_file='SongUrls.csv' def __init__(self, *args, **kwargs): #从song_url.csv 文件中读取得到所有歌曲url f = open(self.song_url_file,"r") lines = f.readlines() #这里line[:-1]的含义是每行末尾都是一个换行符,要去掉 #这里in lines[1:]的含义是csv第一行是字段名称,要去掉 song_url_list=[line[:-1] for line in lines[1:]] f.close() while '\n' in song_url_list: song_url_list.remove('\n') self.start_urls = song_url_list#[:100]#删除[:100]之后爬取全部数据 def parse(self,response): lyric_lines=response.xpath('//*[@id="lrc"]/div[1]/text()').extract() lyric='' for lyric_line in lyric_lines: lyric+=lyric_line #print lyric lyricItem=LyricItem() lyricItem['lyric']=lyric lyricItem['song_url']=response.url yield lyricItem songinfo因为还没有用到所以不重要。 ------------------------------分割线-------------------------------------- #items.py import scrapy class SongUrlItem(scrapy.Item): # define the fields for your item here like: # name = scrapy.Field() song_url=scrapy.Field() #歌曲链接 class LyricItem(scrapy.Item): # define the fields for your item here like: # name = scrapy.Field() lyric=scrapy.Field() #歌词 song_url=scrapy.Field() #歌曲链接 class SongInfoItem(scrapy.Item): # define the fields for your item here like: # name = scrapy.Field() song_url=scrapy.Field() #歌曲链接 song_title=scrapy.Field() #歌名 album=scrapy.Field() #专辑 #singer=scrapy.Field() #歌手 language=scrapy.Field() #语种 ------------------------------分割线-------------------------------------- 在middleware下加了几行: sleep_seconds = 0.2 # 模拟点击后休眠3秒,给出浏览器取得响应内容的时间 default_sleep_seconds = 1 # 无动作请求休眠的时间 def process_request(self, request, spider): spider.logger.info('--------Spider request processed: %s' % spider.name) page = None driver = webdriver.PhantomJS() spider.logger.info('--------request.url: %s' % request.url) driver.get(request.url) driver.implicitly_wait(0.2) # 仅休眠数秒加载页面后返回内容 time.sleep(self.sleep_seconds) page = driver.page_source driver.close() return HtmlResponse(request.url, body=page, encoding='utf-8', request=request) ------------------------------分割线-------------------------------------- setting中加了几行也改了几行: from faker import Factory f = Factory.create() USER_AGENT = f.user_agent() DOWNLOAD_DELAY = 0.2 DEFAULT_REQUEST_HEADERS = { 'Host': 'www.xiami.com', 'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate, br', 'Accept-Language': 'zh-CN,zh;q=0.8', 'Cache-Control': 'no-cache', 'Connection': 'Keep-Alive', } ITEM_PIPELINES = { 'xiami2.pipelines.Xiami2Pipeline': 300, }
Kafka实战(三) - Kafka的自我修养与定位
Apache Kafka是消息引擎系统,也是一个分布式流处理平台(Distributed Streaming Platform) Kafka是LinkedIn公司内部孵化的项目。LinkedIn最开始有强烈的数据强实时处理方面的需求,其内部的诸多子系统要执行多种类型的数据处理与分析,主要包括业务系统和应用程序性能监控,以及用户行为数据处理等。 遇到的主要问题: 数据正确性不足 数据的收集主要...
volatile 与 synchronize 详解
Java支持多个线程同时访问一个对象或者对象的成员变量,由于每个线程可以拥有这个变量的拷贝(虽然对象以及成员变量分配的内存是在共享内存中的,但是每个执行的线程还是可以拥有一份拷贝,这样做的目的是加速程序的执行,这是现代多核处理器的一个显著特性),所以程序在执行过程中,一个线程看到的变量并不一定是最新的。 volatile 关键字volatile可以用来修饰字段(成员变量),就是告知程序任何对该变量...
Java学习的正确打开方式
在博主认为,对于入门级学习java的最佳学习方法莫过于视频+博客+书籍+总结,前三者博主将淋漓尽致地挥毫于这篇博客文章中,至于总结在于个人,实际上越到后面你会发现学习的最好方式就是阅读参考官方文档其次就是国内的书籍,博客次之,这又是一个层次了,这里暂时不提后面再谈。博主将为各位入门java保驾护航,各位只管冲鸭!!!上天是公平的,只要不辜负时间,时间自然不会辜负你。 何谓学习?博主所理解的学习,它是一个过程,是一个不断累积、不断沉淀、不断总结、善于传达自己的个人见解以及乐于分享的过程。
程序员必须掌握的核心算法有哪些?
由于我之前一直强调数据结构以及算法学习的重要性,所以就有一些读者经常问我,数据结构与算法应该要学习到哪个程度呢?,说实话,这个问题我不知道要怎么回答你,主要取决于你想学习到哪些程度,不过针对这个问题,我稍微总结一下我学过的算法知识点,以及我觉得值得学习的算法。这些算法与数据结构的学习大多数是零散的,并没有一本把他们全部覆盖的书籍。下面是我觉得值得学习的一些算法以及数据结构,当然,我也会整理一些看过...
有哪些让程序员受益终生的建议
从业五年多,辗转两个大厂,出过书,创过业,从技术小白成长为基层管理,联合几个业内大牛回答下这个问题,希望能帮到大家,记得帮我点赞哦。 敲黑板!!!读了这篇文章,你将知道如何才能进大厂,如何实现财务自由,如何在工作中游刃有余,这篇文章很长,但绝对是精品,记得帮我点赞哦!!!! 一腔肺腑之言,能看进去多少,就看你自己了!!! 目录: 在校生篇: 为什么要尽量进大厂? 如何选择语言及方...
大学四年自学走来,这些私藏的实用工具/学习网站我贡献出来了
大学四年,看课本是不可能一直看课本的了,对于学习,特别是自学,善于搜索网上的一些资源来辅助,还是非常有必要的,下面我就把这几年私藏的各种资源,网站贡献出来给你们。主要有:电子书搜索、实用工具、在线视频学习网站、非视频学习网站、软件下载、面试/求职必备网站。 注意:文中提到的所有资源,文末我都给你整理好了,你们只管拿去,如果觉得不错,转发、分享就是最大的支持了。 一、电子书搜索 对于大部分程序员...
linux系列之常用运维命令整理笔录
本博客记录工作中需要的linux运维命令,大学时候开始接触linux,会一些基本操作,可是都没有整理起来,加上是做开发,不做运维,有些命令忘记了,所以现在整理成博客,当然vi,文件操作等就不介绍了,慢慢积累一些其它拓展的命令,博客不定时更新 free -m 其中:m表示兆,也可以用g,注意都要小写 Men:表示物理内存统计 total:表示物理内存总数(total=used+free) use...
比特币原理详解
一、什么是比特币 比特币是一种电子货币,是一种基于密码学的货币,在2008年11月1日由中本聪发表比特币白皮书,文中提出了一种去中心化的电子记账系统,我们平时的电子现金是银行来记账,因为银行的背后是国家信用。去中心化电子记账系统是参与者共同记账。比特币可以防止主权危机、信用风险。其好处不多做赘述,这一层面介绍的文章很多,本文主要从更深层的技术原理角度进行介绍。 二、问题引入 假设现有4个人...
GitHub开源史上最大规模中文知识图谱
近日,一直致力于知识图谱研究的 OwnThink 平台在 Github 上开源了史上最大规模 1.4 亿中文知识图谱,其中数据是以(实体、属性、值),(实体、关系、实体)混合的形式组织,数据格式采用 csv 格式。 到目前为止,OwnThink 项目开放了对话机器人、知识图谱、语义理解、自然语言处理工具。知识图谱融合了两千五百多万的实体,拥有亿级别的实体属性关系,机器人采用了基于知识图谱的语义感...
程序员接私活怎样防止做完了不给钱?
首先跟大家说明一点,我们做 IT 类的外包开发,是非标品开发,所以很有可能在开发过程中会有这样那样的需求修改,而这种需求修改很容易造成扯皮,进而影响到费用支付,甚至出现做完了项目收不到钱的情况。 那么,怎么保证自己的薪酬安全呢? 我们在开工前,一定要做好一些证据方面的准备(也就是“讨薪”的理论依据),这其中最重要的就是需求文档和验收标准。一定要让需求方提供这两个文档资料作为开发的基础。之后开发...
网页实现一个简单的音乐播放器(大佬别看。(⊙﹏⊙))
今天闲着无事,就想写点东西。然后听了下歌,就打算写个播放器。 于是乎用h5 audio的加上js简单的播放器完工了。 演示地点演示 html代码如下` music 这个年纪 七月的风 音乐 ` 然后就是css`*{ margin: 0; padding: 0; text-decoration: none; list-...
微信支付崩溃了,但是更让马化腾和张小龙崩溃的竟然是……
loonggg读完需要3分钟速读仅需1分钟事件还得还原到昨天晚上,10 月 29 日晚上 20:09-21:14 之间,微信支付发生故障,全国微信支付交易无法正常进行。然...
Python十大装B语法
Python 是一种代表简单思想的语言,其语法相对简单,很容易上手。不过,如果就此小视 Python 语法的精妙和深邃,那就大错特错了。本文精心筛选了最能展现 Python 语法之精妙的十个知识点,并附上详细的实例代码。如能在实战中融会贯通、灵活使用,必将使代码更为精炼、高效,同时也会极大提升代码B格,使之看上去更老练,读起来更优雅。
数据库优化 - SQL优化
以实际SQL入手,带你一步一步走上SQL优化之路!
2019年11月中国大陆编程语言排行榜
2019年11月2日,我统计了某招聘网站,获得有效程序员招聘数据9万条。针对招聘信息,提取编程语言关键字,并统计如下: 编程语言比例 rank pl_ percentage 1 java 33.62% 2 cpp 16.42% 3 c_sharp 12.82% 4 javascript 12.31% 5 python 7.93% 6 go 7.25% 7 p...
通俗易懂地给女朋友讲:线程池的内部原理
餐盘在灯光的照耀下格外晶莹洁白,女朋友拿起红酒杯轻轻地抿了一小口,对我说:“经常听你说线程池,到底线程池到底是个什么原理?”
《奇巧淫技》系列-python!!每天早上八点自动发送天气预报邮件到QQ邮箱
将代码部署服务器,每日早上定时获取到天气数据,并发送到邮箱。 也可以说是一个小型人工智障。 知识可以运用在不同地方,不一定非是天气预报。
经典算法(5)杨辉三角
杨辉三角 是经典算法,这篇博客对它的算法思想进行了讲解,并有完整的代码实现。
英特尔不为人知的 B 面
从 PC 时代至今,众人只知在 CPU、GPU、XPU、制程、工艺等战场中,英特尔在与同行硬件芯片制造商们的竞争中杀出重围,且在不断的成长进化中,成为全球知名的半导体公司。殊不知,在「刚硬」的背后,英特尔「柔性」的软件早已经做到了全方位的支持与支撑,并持续发挥独特的生态价值,推动产业合作共赢。 而对于这一不知人知的 B 面,很多人将其称之为英特尔隐形的翅膀,虽低调,但是影响力却不容小觑。 那么,在...
腾讯算法面试题:64匹马8个跑道需要多少轮才能选出最快的四匹?
昨天,有网友私信我,说去阿里面试,彻底的被打击到了。问了为什么网上大量使用ThreadLocal的源码都会加上private static?他被难住了,因为他从来都没有考虑过这个问题。无独有偶,今天笔者又发现有网友吐槽了一道腾讯的面试题,我们一起来看看。 腾讯算法面试题:64匹马8个跑道需要多少轮才能选出最快的四匹? 在互联网职场论坛,一名程序员发帖求助到。二面腾讯,其中一个算法题:64匹...
面试官:你连RESTful都不知道我怎么敢要你?
干货,2019 RESTful最贱实践
刷了几千道算法题,这些我私藏的刷题网站都在这里了!
遥想当年,机缘巧合入了 ACM 的坑,周边巨擘林立,从此过上了"天天被虐似死狗"的生活… 然而我是谁,我可是死狗中的战斗鸡,智力不够那刷题来凑,开始了夜以继日哼哧哼哧刷题的日子,从此"读题与提交齐飞, AC 与 WA 一色 ",我惊喜的发现被题虐既刺激又有快感,那一刻我泪流满面。这么好的事儿作为一个正直的人绝不能自己独享,经过激烈的颅内斗争,我决定把我私藏的十几个 T 的,阿不,十几个刷题网...
为啥国人偏爱Mybatis,而老外喜欢Hibernate/JPA呢?
关于SQL和ORM的争论,永远都不会终止,我也一直在思考这个问题。昨天又跟群里的小伙伴进行了一番讨论,感触还是有一些,于是就有了今天这篇文。 声明:本文不会下关于Mybatis和JPA两个持久层框架哪个更好这样的结论。只是摆事实,讲道理,所以,请各位看官勿喷。 一、事件起因 关于Mybatis和JPA孰优孰劣的问题,争论已经很多年了。一直也没有结论,毕竟每个人的喜好和习惯是大不相同的。我也看...
白话阿里巴巴Java开发手册高级篇
不久前,阿里巴巴发布了《阿里巴巴Java开发手册》,总结了阿里巴巴内部实际项目开发过程中开发人员应该遵守的研发流程规范,这些流程规范在一定程度上能够保证最终的项目交付质量,通过在时间中总结模式,并推广给广大开发人员,来避免研发人员在实践中容易犯的错误,确保最终在大规模协作的项目中达成既定目标。 无独有偶,笔者去年在公司里负责升级和制定研发流程、设计模板、设计标准、代码标准等规范,并在实际工作中进行...
SQL-小白最佳入门sql查询一
不要偷偷的查询我的个人资料,即使你再喜欢我,也不要这样,真的不好;
项目中的if else太多了,该怎么重构?
介绍 最近跟着公司的大佬开发了一款IM系统,类似QQ和微信哈,就是聊天软件。我们有一部分业务逻辑是这样的 if (msgType = "文本") { // dosomething } else if(msgType = "图片") { // doshomething } else if(msgType = "视频") { // doshomething } else { // doshom...
Nginx 原理和架构
Nginx 是一个免费的,开源的,高性能的 HTTP 服务器和反向代理,以及 IMAP / POP3 代理服务器。Nginx 以其高性能,稳定性,丰富的功能,简单的配置和低资源消耗而闻名。 Nginx 的整体架构 Nginx 里有一个 master 进程和多个 worker 进程。master 进程并不处理网络请求,主要负责调度工作进程:加载配置、启动工作进程及非停升级。worker 进程负责处...
YouTube排名第一的励志英文演讲《Dream(梦想)》
Idon’t know what that dream is that you have, I don't care how disappointing it might have been as you've been working toward that dream,but that dream that you’re holding in your mind, that it’s po...
“狗屁不通文章生成器”登顶GitHub热榜,分分钟写出万字形式主义大作
一、垃圾文字生成器介绍 最近在浏览GitHub的时候,发现了这样一个骨骼清奇的雷人项目,而且热度还特别高。 项目中文名:狗屁不通文章生成器 项目英文名:BullshitGenerator 根据作者的介绍,他是偶尔需要一些中文文字用于GUI开发时测试文本渲染,因此开发了这个废话生成器。但由于生成的废话实在是太过富于哲理,所以最近已经被小伙伴们给玩坏了。 他的文风可能是这样的: 你发现,...
程序员:我终于知道post和get的区别
是一个老生常谈的话题,然而随着不断的学习,对于以前的认识有很多误区,所以还是需要不断地总结的,学而时习之,不亦说乎
相关热词 c# clr dll c# 如何orm c# 固定大小的字符数组 c#框架设计 c# 删除数据库 c# 中文文字 图片转 c# 成员属性 接口 c#如何将程序封装 16进制负数转换 c# c#练手项目
立即提问