python 如何将windows-1252转换为utf-8

问题是这样的:
我用python程序读取带有中文的文件名:'E:\MyProject\SVN_Project\Drawingboard_local\model\mydata\input\production\a\һ�ɳ���.htm'

发现乱码,
正确的目录地址是:
'E:\MyProject\SVN_Project\Drawingboard_local\model\mydata\input\production\a\示波器.htm'

我把乱码的"示波器"部分截取出来得到的乱码部分,用chardet去做了一个字符串编码类型检测:
mycoding = chardet.detect(videoFileName)["encoding"]
得到该中文部分的编码格式是:'windows-1252'
但实际上我在python文件头部加上了:

- coding: utf-8 -

import sys
reload(sys)
sys.setdefaultencoding("utf-8")
还是不是utf-8编码格式没用。

截取的中文部分写入文件时一直报错,请教各位朋友:如何将windows-1252转换为utf-8格式,十分感谢

1个回答

 str2 = str.decode('windows-1252')
str2.encode('utf-8')
Csdn user default icon
上传中...
上传图片
插入图片
抄袭、复制答案,以达到刷声望分或其他目的的行为,在CSDN问答是严格禁止的,一经发现立刻封号。是时候展现真正的技术了!
其他相关推荐
如何可靠地猜测 MacRoman,CP1252,Latin1,UTF-8和 ASCII 之间的编码

<div class="post-text" itemprop="text"> <p>At work it seems like no week ever passes without some encoding-related conniption, calamity, or catastrophe. The problem usually derives from programmers who think they can reliably process a “text” file without specifying the encoding. But you can't.</p> <p>So it's been decided to henceforth forbid files from ever having names that end in <code>*.txt</code> or <code>*.text</code>. The thinking is that those extensions mislead the casual programmer into a dull complacency regarding encodings, and this leads to improper handling. It would almost be better to have no extension at all, because at least then you <em>know</em> that you don’t know what you’ve got.</p> <p>However, we aren’t goint to go that far. Instead you will be expected to use a filename that ends in the encoding. So for text files, for example, these would be something like <code>README.ascii</code>, <code>README.latin1</code>, <code>README.utf8</code>, etc.</p> <p>For files that demand a particular extension, if one can specify the encoding inside the file itself, such as in Perl or Python, then you shall do that. For files like Java source where no such facility exists internal to the file, you will put the encoding before the extension, such as <code>SomeClass-utf8.java</code>.</p> <p>For output, UTF-8 is to be <strong>strongly</strong> preferred.</p> <p>But for input, we need to figure out how to deal with the thousands of files in our codebase named <code>*.txt</code>. We want to rename all of them to fit into our new standard. But we can’t possibly eyeball them all. So we need a library or program that actually works.</p> <p>These are variously in ASCII, ISO-8859-1, UTF-8, Microsoft CP1252, or Apple MacRoman. Although we're know we can tell if something is ASCII, and we stand a good change of knowing if something is probably UTF-8, we’re stumped about the 8-bit encodings. Because we’re running in a mixed Unix environment (Solaris, Linux, Darwin) with most desktops being Macs, we have quite a few annoying MacRoman files. And these especially are a problem.</p> <p>For some time now I’ve been looking for a way to programmatically determine which of</p> <ol> <li>ASCII</li> <li>ISO-8859-1</li> <li>CP1252</li> <li>MacRoman</li> <li>UTF-8</li> </ol> <p>a file is in, and I haven’t found a program or library that can reliably distinguish between those the three different 8-bit encodings. We probably have over a thousand MacRoman files alone, so whatever charset detector we use has to be able to sniff those out. Nothing I’ve looked at can manage the trick. I had big hopes for the <a href="http://userguide.icu-project.org/conversion/detection#TOC-CharsetDetector" rel="noreferrer">ICU charset detector library</a>, but it cannot handle MacRoman. I’ve also looked at modules to do the same sort of thing in both Perl and Python, but again and again it’s always the same story: no support for detecting MacRoman.</p> <p>What I am therefore looking for is an existing library or program that reliably determines which of those five encodings a file is in—and preferably more than that. In particular it has to distinguish between the three 3-bit encoding I’ve cited, <strong>especially MacRoman</strong>. The files are more than 99% English language text; there are a few in other languages, but not many.</p> <p>If it’s library code, our language preference is for it to be in Perl, C, Java, or Python, and in that order. If it’s just a program, then we don’t really care what language it’s in so long as it comes in full source, runs on Unix, and is fully unencumbered.</p> <p>Has anyone else had this problem of a zillion legacy text files randomly encoded? If so, how did you attempt to solve it, and how successful were you? This is the most important aspect of my question, but I’m also interested in whether you think encouraging programmers to name (or rename) their files with the actual encoding those files are in will help us avoid the problem in the future. Has anyone ever tried to enforce this on an institutional basis, and if so, was <em>that</em> successful or not, and why?</p> <p>And yes, I fully understand why one cannot guarantee a definite answer given the nature of the problem. This is especially the case with small files, where you don’t have enough data to go on. Fortunately, our files are seldom small. Apart from the random <code>README</code> file, most are in the size range of 50k to 250k, and many are larger. Anything more than a few K in size is guaranteed to be in English.</p> <p>The problem domain is biomedical text mining, so we sometimes deal with extensive and extremely large corpora, like all of PubMedCentral’s Open Access respository. A rather huge file is the BioThesaurus 6.0, at 5.7 gigabytes. This file is especially annoying because it is <em>almost</em> all UTF-8. However, some numbskull went and stuck a few lines in it that are in some 8-bit encoding—Microsoft CP1252, I believe. It takes quite a while before you trip on that one. :(</p> </div> <p>转载于:https://stackoverflow.com/questions/4198804/how-to-reliably-guess-the-encoding-between-macroman-cp1252-latin1-utf-8-and</p>

python3 求问chardet.detect返回网页编码及转码问题

比如这个网站 https://www.quanmin.tv/ 源码<meta charset="utf-8"> 但用chardet.detect返回,值为Windows-1254,为什么会出现这种情况呢? 并且这种情况怎样才能转码为utf-8,先decode再encode不能实现正确转码 还有一些网页chardet.detect返回值为为ascii甚至是none,为什么会有这种情况呢?要怎样转化为utf-8呢?求大神们赐教!一直纠结这个问题

去语言如何将ANSI文本转换为utf8?

<div class="post-text" itemprop="text"> <p>How to convert ansi text to utf8 in golang (go language)? I try to convert ansi string to utf8 string.</p> </div>

python调用vlc显示视频实例

将libvlc.dll和libvlccore.dll放在sdk文件夹中,plugins文件夹也放在sdk文件夹中,sdk文件夹放在程序目录下。 我弄这个,实际上是为了推广,我在北京。有意者请加我微信,我的微商微信:xi9902 myvlc.py文件: import ctypes import os class myvlc(): def __init__(self): plugin_arg = r"--plugin-path=" + os.path.join(os.getcwd(), r'sdk\plugins') arguments = ["-I", "dummy", "--no-ignore-config", plugin_arg] arguments = [bytes(a,"utf-8") for a in arguments] p = os.getcwd() os.chdir(os.path.join(p, 'sdk')) dll = ctypes.CDLL("libvlc.dll") os.chdir(p) self.dll = dll self.libvlc_instance_ = self.dll.libvlc_new(len(arguments),(ctypes.c_char_p * len(arguments))(*arguments)) self.libvlc_media_player_ = self.dll.libvlc_media_player_new(self.libvlc_instance_) self.libvlc_media = None def __del__(self): self.Stop() if(self.libvlc_media): self.dll.libvlc_media_release(self.libvlc_media) if(self.libvlc_media_player_): self.dll.libvlc_media_player_release(self.libvlc_media_player_) if(self.libvlc_instance_): self.dll.libvlc_release(self.libvlc_instance_) def SetRenderWindow(self,hwnd):#嵌入窗体 if(self.libvlc_instance_): self.dll.libvlc_media_player_set_hwnd(self.libvlc_media_player_,hwnd) def PlayFile(self,filePath):#播放文件 if(self.libvlc_media): self.dll.libvlc_media_release(self.libvlc_media) self.libvlc_media = None self.libvlc_media = self.dll.libvlc_media_new_path(self.libvlc_instance_, filePath.encode("utf-8")) if(self.libvlc_media): self.dll.libvlc_media_player_set_media(self.libvlc_media_player_, self.libvlc_media) self.dll.libvlc_media_player_play(self.libvlc_media_player_) def PlayUrl(self,filePath):#播放网络流 if(self.libvlc_media): self.dll.libvlc_media_release(self.libvlc_media) self.libvlc_media = None self.libvlc_media = self.dll.libvlc_media_new_location(self.libvlc_instance_, filePath.encode("utf-8")) if(self.libvlc_media): self.dll.libvlc_media_player_set_media(self.libvlc_media_player_, self.libvlc_media) self.dll.libvlc_media_player_play(self.libvlc_media_player_) def Stop(self):#停止播放 if(self.libvlc_media_player_): self.dll.libvlc_media_player_stop(self.libvlc_media_player_) if(self.libvlc_media): self.dll.libvlc_media_release(self.libvlc_media) self.libvlc_media = None 调用方法: import myvlc v=myvlc.myvlc() self.v=v v.SetRenderWindow(ctypes.c_void_p(int(self.pushButton_8.winId()))) v.PlayFile(r'F:\111.wmv')

将PAYGATE API的PHP Curl请求转换为python

<div class="post-text" itemprop="text"> <p>I am not much familiar with php and curl, need to convert an advance PHP cURL POST request to python equivalent.</p> <p>It's a code from payment gateway site called paygate, and am using their sample php API from <a href="https://developer.paygate.co.za/" rel="nofollow">developer.paygate.co.za/</a>. The code that I tried to convert into python is below: </p> <pre><code>&lt;?php //The PayGate PayXML URL define( "SERVER_URL", "https://www.paygate.co.za/payxml/process.trans" ); //Construct the XML document header $XMLHeader = "&lt;?xml version=\"1.0\" encoding=\"UTF-8\"?&gt;&lt;!DOCTYPE protocol SYSTEM \"https://www.paygate.co.za/payxml/payxml_v4.dtd\"&gt;"; // - Then construct the full transaction XML $XMLTrans = '&lt;protocol ver="4.0" pgid="10011013800" pwd="test"&gt;&lt;authtx cref="ABCqwerty1234" cname="Patel Sunny" cc="5200000000000015" exp="032022" budp="0" amt="10000" cur="ZAR" cvv="123" rurl="http://localhost/pg_payxml_php_final.php" nurl="http://localhost/pg_payxml_php_notify.php" /&gt;&lt;/protocol&gt;' // Construct the request XML by combining the XML header and transaction $Request = $XMLHeader.$XMLTrans; // Create the POST data header containing the transaction $header[] = "Content-type: text/xml"; $header[] = "Content-length: ".strlen($Request)." "; $header[] = $Request; // Use cURL to post the transaction to PayGate // - first instantiate cURL; if it fails then quit now. $ch = curl_init(); if (!$ch) die("ERROR: cURL initialization failed. Check your cURL/PHP configuration."); // - then set the cURL options; to ignore SSL invalid certificates; set timeouts etc. curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, 0); curl_setopt ($ch, CURLOPT_SSL_VERIFYHOST, 0); curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt ($ch, CURLOPT_TIMEOUT, 60); curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"); curl_setopt ($ch, CURLOPT_CUSTOMREQUEST, "POST"); // - then set the PayXML URL and the transaction data curl_setopt ($ch, CURLOPT_URL, SERVER_URL); curl_setopt ($ch, CURLOPT_HTTPHEADER, $header); // Connect to PayGate PayXML and send data $Response = curl_exec ($ch); // Checl for any connection errors and then close the connection. $curlError = curl_errno($ch); curl_close($ch); </code></pre> <p>I know about basic requests in python but couldn't pass attributes in that request, I am also confused about passing cURL data in requests.</p> <p>I am trying it like: </p> <pre><code>import requests post_data = {'pgid':'10011013800', 'pwd':'test', 'cref': 'ABCX1yty36858gh', 'cname':'PatelSunny', 'cc':'5200000470000015', 'exp':'032022', 'budp':'0', 'amt':'50000', 'cur':'ZAR', 'cvv':'123', 'rurl':'http://localhost/pg_payxml_php_final.php', 'nurl':'http://localhost/pg_payxml_php_notify.php', 'submit':'Submit' } r = requests.get('https://www.paygate.co.za/payxml/process.trans', params=post_data,headers=headers) # print(r.url) print r.text </code></pre> <p>But it shows error </p> <blockquote> <p>405 - HTTP verb used to access this page is not allowed.</p> </blockquote> </div>

Python使用urllib2 urlopen打开网页不正确

``` #!/usr/bin/python # -*- coding: utf-8 -*- import urllib; import urllib2; import os; import sys; import shutil; def searchVT(): VTMainUrl = 'https://www.virustotal.com/en/#search'; headers = { 'accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 'content-type':'application/x-www-form-urlencode', 'referer':'https://www.virustotal.com/', 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.101 Safari/537.36'}; postDict = {'query' : '18092AC0F4F694C60762DC98C9F66BC3',}; postData = urllib.urlencode(postDict); req = urllib2.Request(VTMainUrl, postData, headers); try: respHtml = urllib2.urlopen(req).read(); except urllib2.URLError,e: if hasattr(e,"reason"): print "Failed to reach the server" print "The reason:",e.reason elif hasattr(e,"code"): print "The server couldn't fulfill the request" print "Error code:",e.code print "Return content:",e.read() else: pass #其他异常的处理 file_object = open('thefile.txt', 'w') file_object.write(respHtml) file_object.close( ) print respHtml; return respHtml; if __name__=="__main__": searchVT(); ``` 最近使用urllib2 urlopen尝试打开VT网页并进行使用MD5查询,但是不知道为什么返回的网页为空,求大神赐教~

golang CLI:如何检测当前的终端编码并将用户输入与utf8相互转换?

<div class="post-text" itemprop="text"> <p>I am writing a golang command line program that accepts user input. This input string has to be converted to UTF-8 and sent to another server for processing. On Linux, the terminal encoding is almost always UTF-8 but this does not seem to be the case in Windows. I tried setting the codepage on windows to 65001 using</p> <pre><code>chcp 65001 </code></pre> <p>and also ensured the terminal font is set to Lucida console. However, the bytes read by</p> <pre><code>fmt.Scanf() </code></pre> <p>is not in UTF-8 format. I want to be able to detect the character encoding and convert the strings to UTF-8. Similarly, I should be able to convert from UTF-8 to the local encoding before printing to the screen.</p> <p>Python seems to have "locale" package which can get the default encoding, decode and encode strings to any specified encoding. Is there an equivalent of this for golang? </p> <p>Most of the stackoverflow discussions pointed at using chcp 65001 to change the encoding on windows terminal to UTF-8. This doesn't seem to work for me.</p> <pre><code>func main() { foo := "" fmt.Printf("Enter: ") if _, err := fmt.Scanln(&amp;foo) ; err != nil { fmt.Println("Error while scanning: ", err) } fmt.Printf("Scanned bytes: % x", foo) fmt.Println() } </code></pre> <p>On Linux:</p> <pre><code>// ASCII $ go run test.go Enter: hello Scanned bytes: 68 65 6c 6c 6f // Unicode $ go run test.go Enter: © Scanned bytes: c2 a9 // Unicode $ go run test.go Enter: ΆΏΑΓΔΘΞ Scanned bytes: ce 86 ce 8f ce 91 ce 93 ce 94 ce 98 ce 9e ce a3 ce a8 ce a9 ce aa ce ad ce b1 ce b2 ce ba </code></pre> <p>On Windows:</p> <pre><code>PS C:\&gt; chcp Active code page: 437 PS C:\&gt; go run .\test.go Enter: hello Scanned bytes: 68 65 6c 6c 6f PS C:\&gt; go run .\test.go Enter: ΆΏΑΓΔΘΞ Scanned bytes: 3f 3f 61 // Change to Unicode PS C:\&gt; chcp 65001 Active code page: 65001 PS C:\&gt; go run .\test.go Enter: ΆΏΑΓΔΘΞ Error while scanning: EOF Scanned bytes: </code></pre> <p>Appreciate any help/pointers.</p> </div>

python 爬取表格 获取不到数据

我使用python爬取网页表格数据的时候使用 request.get获取不到页面内容。 爬取网址为:http://data.10jqka.com.cn/rank/cxg/board/4/field/stockcode/order/desc/page/2/ajax/1/free/1/ 这是Elements ![图片说明](https://img-ask.csdn.net/upload/202002/17/1581950847_829340.jpg) ``` import os import requests from lxml import etree url='http://data.10jqka.com.cn/rank/cxg/board/4/field/stockcode/order/desc/page/2/ajax/1/free/1/' #url1='http://data.10jqka.com.cn/rank/cxg/' headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.106 Safari/537.36'} res = requests.get(url, headers=headers) res_elements = etree.HTML(res.text) table = res_elements.xpath('/html/body/table') print(table) table = etree.tostring(table[0], encoding='utf-8').decode() df = pd.read_html(table, encoding='utf-8', header=0)[0] results = list(df.T.to_dict().values()) # 转换成列表嵌套字典的格式 df.to_csv("std.csv", index=False) ``` res.text 里的数据为 (不包含列表数据) ``` '<html><body>\n <script type="text/javascript" src="//s.thsi.cn/js/chameleon/chameleon.min.1582008.js"></script> <script src="//s.thsi.cn/js/chameleon/chameleon.min.1582008.js" type="text/javascript"></script>\n <script language="javascript" type="text/javascript">\n window.location.href="http://data.10jqka.com.cn/rank/cxg/board/4/field/stockcode/order/desc/page/2/ajax/1/free/1/";\n </script>\n </body></html>\n' ```

python多线下载为什么下载不了?starting failed

代码: ``` from downloader import Downloader #, cStringIO, cPickle from threading import Thread from time import sleep import log2 as log from os.path import basename import requests as req import pickle from os.path import exists db='E:/tmp/download.data' def append(obj): try: if exists(db): with open(db,'rb') as f: data=pickle.load(f) else: data={} except: data={} data[obj['url']]=obj with open(db,'wb') as f: pickle.dump(data,f) def load(url): if not exists(db): return None try: with open(db,'rb') as f: data=pickle.load(f) return data.get(url) except: return None def out(msg): print(msg) import time from os.path import basename, exists, getsize from queue import Queue from threading import Lock, Thread, current_thread import requests as req import random as rand import conf class Downloader: KB=1024 MB=KB*KB GB=KB*MB range_size=MB max_workers=10 spd_refresh_interval=1 user_agents=[ 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.246', 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1', 'Mozilla/5.0 (Windows NT 6.4; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2225.0 Safari/537.36' 'Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.93 Safari/537.36', 'Mozilla/5.0 (Windows NT 6.3; rv:36.0) Gecko/20100101 Firefox/36.0' ] chunk_size=KB max_error=0.1 #单线程允许最大出错率 max_error_one_worker=0.5 #仅剩一个线程时允许的最大出错率 home='E:/tmp/' #下载目录 def __init__(self,c): self.__locks={i:Lock() for i in ('file','worker_info','itr_job','download_info')} self.__config=c self.__alive=False self.__fails=Queue() self.__conf=c c=conf.load(c['url']) if c: self.__conf=c self.__init_from_conf() else: self.__init_task() def __init_from_conf(self): self.__download_offset=self.__conf['offset'] for i in self.__conf['fails']: self.__fails.put(i) def __get_agent(self): return self.user_agents[rand.randint(0,len(self.user_agents)-1)] def __init_task(self): headers={'Range':'bytes=0-0'} headers = {'Host': 'https://files.pythonhosted.org/packages/','User-Agent':'Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36 LBBROWSER','Referer': 'https://pypi.org/', 'Connection': 'keep-alive', 'Upgrade-Insecure-Requests': '1', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36 LBBROWSER', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 'Accept-Encoding': 'gzip, deflate, sdch, br', 'Accept-Language': 'zh-CN,zh;q=0.8'} headers['User-Agent']=self.__get_agent() print(headers) try: r=req.get(self.__conf['url'],headers=headers,stream=True) self.__conf['name'] = basename(self.__conf['url']) or str(int(round(time.time()*1000))) self.__conf['206'] = r.status_code == 206 or r.headers.get('Accept-Ranges')=='bytes' if self.__conf['206']: self.__conf['len']=int(r.headers['Content-Range'].split('/')[-1]) elif r.status_code!=200: out('init task err') return else: self.__conf['len']=int(r.headers['Content-Length']) r.close() self.__download_offset=0 self.__conf['init']=True except Exception as e: out(e) def __itr_job(self): if self.__locks['itr_job'].acquire(): if not self.__fails.empty(): ans=self.__fails.get() elif self.__download_offset<self.__conf['len']: o=self.__download_offset ans=(o,min(self.__conf['len']-1,o+self.range_size-1)) self.__download_offset+=self.range_size else: ans=(-1,-1) self.__locks['itr_job'].release() return ans def __has_job(self): if self.__locks['itr_job'].acquire(): ans=self.__download_offset<self.__conf['len'] or not self.__fails.empty() self.__locks['itr_job'].release() return ans def __download_no_206(self): headers={'User-Agent':self.__get_agent()} r=req.get(self.__conf['url'],headers=headers,stream=True) self.__download_offset=0 if r.status_code != 200: r.close() self.__stopped() return try: for con in r.iter_content(chunk_size=self.chunk_size): if self.__kill_signal: break self.__file.write(con) l=len(con) self.__down_bytes+=l self.__download_offset+=l t0=time.time() t=t0-self.__last_time if t>=self.spd_refresh_interval: self.__down_spd=self.__down_bytes/t out('downspd: %d KB/s'%(self.__down_spd/self.KB)) self.__last_time=t0 self.__down_bytes=0 except: pass r.close() self.__stopped() def __download_206(self): file_len=self.__conf['len'] total=0 error=0 kill=False with req.session() as sess: while True: s,e=self.__itr_job() if s==-1: out('no job stop') break headers={'Range':'bytes=%d-%d'%(s,e)} headers['User-Agent']=self.__get_agent() try: r=sess.get(self.__conf['url'],headers=headers,stream=True) total+=1 if r.status_code!=206: self.__fails.put((s,e)) error+=1 if error>self.max_error*total: if self.__locks['worker_info'].acquire(): num=self.__current_workers self.__locks['worker_info'].release() if error>self.max_error_one_worker*total or num>1: break continue for con in r.iter_content(chunk_size=self.chunk_size): if self.__locks['worker_info'].acquire(): if self.__kill_signal: self.__locks['worker_info'].release() kill=True break self.__locks['worker_info'].release() if self.__locks['file'].acquire(): self.__file.seek(s) self.__file.write(con) l=len(con) s+=l self.__locks['file'].release() if self.__locks['download_info'].acquire(): self.__down_bytes+=l t0=time.time() t=t0-self.__last_time if t>=self.spd_refresh_interval: out('downspd: %d KB/s'%(self.__down_spd/self.KB)) self.__down_spd=self.__down_bytes/t self.__down_bytes=0 self.__last_time=t0 self.__locks['download_info'].release() if s<=e and s<file_len: self.__fails.put((s,e)) if kill: break except : self.__fails.put((s,e)) error+=1 if error>self.max_error*total: if self.__locks['worker_info'].acquire(): num=self.__current_workers self.__locks['worker_info'].release() if error>self.max_error_one_worker*total or num>1: break self.__stopped() def __start_worker(self,target): if self.__locks['worker_info'].acquire(): if self.__kill_signal: self.__locks['worker_info'].release() return False if self.__current_workers<self.max_workers: Thread(target=target).start() self.__current_workers+=1 out('new worker started,current workers %d'%self.__current_workers) self.__locks['worker_info'].release() return True def __start_workers(self): for _ in range(self.max_workers): if not self.__start_worker(self.__download_206): break time.sleep(0.8) def start(self): if self.__alive: out('already started!') return if self.__conf.get('status')=='done': out('already done') return self.__alive=True self.__kill_signal=False self.__conf['status']='working' self.__down_bytes=0 self.__down_spd=0 self.__last_time=0 self.__current_workers=0 self.__start_time=time.time() try: path=self.home+self.__conf['name'] self.__file=open(path,(exists(path) and 'rb+') or 'wb' ) if not self.__conf['206']: Thread(target=self.__start_workers).start() else: self.__start_worker(self.__download_no_206) out('starting done!') except: out('starting failed') def stop(self): if self.__kill_signal: return out('stopping') if self.__locks['worker_info'].acquire(): self.__kill_signal=True if self.__conf['status']=='working': self.__conf['status']='stopped' self.__locks['worker_info'].release() def __after_stopped(self): if not self.__kill_signal: self.__kill_signal=True __alive=False self.__file.close() out('total time: %.2f'%(time.time()-self.__start_time)) self.__conf['offset']=self.__download_offset if not self.__has_job(): self.__conf['status']='done' elif self.__conf.get('status')!='stopped': self.__conf['status']='error' leak=0 ls=[] while not self.__fails.empty(): i=self.__fails.get() leak+=i[1]-i[0]+1 ls.append(i) self.__conf['fails']=ls leak+=max(self.__conf['len']-self.__download_offset,0) out('total leak: %d'%leak) conf.append(self.__conf) def __stopped(self): if self.__locks['worker_info'].acquire(): self.__current_workers-=1 out('%s stopped'%current_thread().name) if self.__current_workers==0: self.__after_stopped() self.__locks['worker_info'].release() #!/usr/bin/env python # coding=utf-8 #import importlib,sys #import sys #sys.setdefaultencoding('gbk') '''import sys import imp import sys reload(sys) sys.setdefaultencoding('utf8') ''' ''' import sys sys.setdefaultencoding('utf-8') import jieba import json''' def main(): from bs4 import BeautifulSoup import urllib.request import urllib.parse as parse import ssl import re import os,os.path import codecs import requests def getHtml(url): global html page = urllib.request.urlopen(url) html = page.read() return html def file(url1,file_name,name): print(url1) #file(name,save_path,filename) #url1= +'/' + filename url1=url1.encode() #file = open(name ,'wb+') #file.write(url1 ) #file.close() #print(file_name) headers = {'Host': 'https://files.pythonhosted.org/packages/','User-Agent':'Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36 LBBROWSER','Referer': 'https://pypi.org/', 'Connection': 'keep-alive', 'Upgrade-Insecure-Requests': '1', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36 LBBROWSER', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 'Accept-Encoding': 'gzip, deflate, sdch, br', 'Accept-Language': 'zh-CN,zh;q=0.8'} #req = urllib.urlretrieve(download_url,headers=headers) #urllib.request.urlopen('https://www.lfd.uci.edu/~gohlke/pythonlibs/') #req = urllib.request.Request(url=url,headers=header) #request = urllib.request.urlopen(url1) #response = urllib.request.urlopen(request) import socket import urllib.request #设置超时时间为30s socket.setdefaulttimeout(5) #解决下载不完全问题且避免陷入死循环 '''try: urllib.request.urlretrieve(url1.decode(),name) except socket.timeout:''' count = 1 while count <= 1: import time # 格式化成2016-03-20 11:45:39形式 print(time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())) # 格式化成Sat Mar 28 22:24:24 2016形式 print(time.strftime("%a %b %d %H:%M:%S %Y", time.localtime())) # 将格式字符串转换为时间戳 a = "Sat Mar 28 22:24:24 2016" print(time.mktime(time.strptime(a,"%a %b %d %H:%M:%S %Y"))) try: urllib.request.urlretrieve(url1.decode(),name) print('\nchangshi'+str(count)+'over\n') break except socket.timeout: err_info = 'Reloading for %d time'%count if count == 1 else 'Reloading for %d times'%count print(err_info) count += 1 except urllib.error.HTTPError: print('urllib.error.HTTPError') except urllib.error.URLError: print('urllib.error.URLError') except ssl.SSLWantReadError: print('ssl.SSLWantReadError') if count > 1: print("downloading picture fialed!") #urllib.request.urlretrieve(url1.decode(),name) global i i += 1 print(url1.decode()) #file = open(name ,'wt+') #file.write(str(req.content())) #file.close() print(file_name) global x print("Completed : .... %d ..." % x) '''for i in range(len(name_list)): j=0 if name_list[i-24:i+1]=='https://pypi.org/project/': name_list1.append(name_list[i+1:i+60])''' print('\n........'+name+'..........complete\n') '''headers = {'Host': 'download.lfd.uci.edu','User-Agent':'Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36 LBBROWSER','Referer': 'https://www.lfd.uci.edu/~gohlke/pythonlibs/', 'Connection': 'keep-alive', 'Upgrade-Insecure-Requests': '1', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36 LBBROWSER', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 'Accept-Encoding': 'gzip, deflate, sdch, br', 'Accept-Language': 'zh-CN,zh;q=0.8'} #req = urllib.urlretrieve(download_url,headers=headers) #urllib.request.urlopen('https://www.lfd.uci.edu/~gohlke/pythonlibs/') #req = urllib.request.Request(url=url,headers=header) request = requests.get(url=url1,headers=headers) #response = urllib.request.urlopen(request) global i i += 1 file = open(name ,'wb+') file.write(request.content) file.close() print(file_name) print("Completed : .... %d ..." % x)''' save_path = os.getcwd() url = 'https://www.lfd.uci.edu/' html = getHtml(url) html=''' </li> <li><a id="kwant"></a><strong><a href="http://kwant-project.org/">Kwant</a></strong>: quantum transport simulations made easy.<br> Requires <a href="https://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy">numpy+mkl</a> and <a href="https://www.lfd.uci.edu/~gohlke/pythonlibs/#tinyarray">tinyarray</a>. <ul> <li><a href="javascript:;" onclick=" javascript:dl([101,116,54,104,51,56,113,108,46,99,118,106,49,119,109,45,50,110,115,95,112,107,47,105,97,53,52,100], &quot;A?:5C9H0ED&lt;G@0&gt;;7I7;&gt;8C34&gt;8C34&gt;&lt;F@BG=J1I7&lt;26&quot;); &quot;javascript: dl(&quot;" title="[2.5 MB] [Jul 06, 2019]">kwant‑1.4.1‑cp38‑cp38‑win_amd64.whl</a></li> <li><a href="javascript:;" onclick=" javascript:dl([101,45,107,108,97,47,116,113,110,99,56,49,118,46,104,50,115,105,53,112,106,119,52,51], &quot;?&gt;C6B;A541D3750:&lt;E&lt;:08BF908BF90D@7F&gt;&lt;D=2&quot;); &quot;javascript: dl(&quot;" title="[2.1 MB] [Jul 06, 2019]">kwant‑1.4.1‑cp38‑cp38‑win32.whl</a></li> <li><a href="javascript:;" onclick=" javascript:dl([101,46,48,105,104,110,51,107,108,99,115,118,109,113,55,100,53,47,54,50,49,119,45,116,112,97,95,52,106], &quot;9BK&lt;G:?F@6DH4FEC0J01E8G5=E8G5=;ED24IH;&gt;AJ0D37&quot;); &quot;javascript: dl(&quot;" title="[2.4 MB] [Feb 28, 2019]">kwant‑1.4.0‑cp37‑cp37m‑win_amd64.whl</a></li> <li><a href="javascript:;" onclick=" javascript:dl([101,52,112,106,45,109,51,99,108,48,104,107,46,53,118,97,105,116,113,119,47,55,50,110,49,115], &quot;HE2A1=&lt;@C:B&gt;F@3G;0;83615D3615D43B?F5E;B97&quot;); &quot;javascript: dl(&quot;" title="[2.1 MB] [Feb 28, 2019]">kwant‑1.4.0‑cp37‑cp37m‑win32.whl</a></li> <li><a href="javascript:;" onclick=" javascript:dl([101,112,99,97,49,105,54,113,115,108,109,53,52,116,51,118,106,107,110,104,50,95,47,48,45,119,100,46], &quot;7C?60&gt;:&lt;E@H2A&lt;G3J;JFG10=5G10=59GH4AD29I5;JHB8&quot;); &quot;javascript: dl(&quot;" title="[2.4 MB] [Feb 28, 2019]">kwant‑1.4.0‑cp36‑cp36m‑win_amd64.whl</a></li> <li><a href="javascript:;" onclick=" javascript:dl([101,47,48,107,53,108,49,119,52,105,110,115,50,104,112,106,116,45,51,113,99,97,46,118,109,54], &quot;:;&gt;B=F3?026D9?@5E7E1@C=AH@C=AHG@689A;E6&lt;4&quot;); &quot;javascript: dl(&quot;" title="[2.1 MB] [Feb 28, 2019]">kwant‑1.4.0‑cp36‑cp36m‑win32.whl</a></li> <li><a href="javascript:;" onclick=" javascript:dl([101,97,51,53,50,107,46,105,54,49,47,104,52,109,100,115,118,119,108,48,112,116,45,99,113,110,106,95], &quot;&gt;3IGC?2D9FC1294@0HDE85;5BEFC12EFC12&lt;E@6HJ0&lt;=7;5@:A&quot;); &quot;javascript: dl(&quot;" title="[2.4 MB] [Feb 28, 2019]">kwant‑1.4.0‑cp35‑cp35m‑win_amd64.whl</a></li> <li><a href="javascript:;" onclick=" javascript:dl([101,46,52,50,104,49,118,119,99,48,107,113,97,115,47,51,45,105,110,112,53,108,106,116,109], &quot;&lt;2E:B5CF=7B&gt;C=96;AF?40108?7B&gt;C?7B&gt;CG?6@A&gt;2063D&quot;); &quot;javascript: dl(&quot;" title="[2.0 MB] [Feb 28, 2019]">kwant‑1.4.0‑cp35‑cp35m‑win32.whl</a></li> <li><a href="javascript:;" onclick=" javascript:dl([101,49,108,109,119,46,113,107,97,95,99,105,110,53,51,104,100,116,112,54,50,52,115,45,118,47,106], &quot;ECI5AG&lt;@H9A=DH637;@F04=4CF9A=DF9A=D2F3:;872?BD43&gt;1&quot;); &quot;javascript: dl(&quot;" title="[2.1 MB] [Jan 06, 2018]">kwant‑1.3.2‑cp34‑cp34m‑win_amd64.whl</a></li> <li><a href="javascript:;" onclick=" javascript:dl([101,106,99,112,47,113,97,116,52,51,49,107,53,109,104,105,110,108,119,45,115,46,50,118], &quot;CE042F;6312873:A5?6B9D8DEB1287B1287&lt;BA&gt;?8EDA=@&quot;); &quot;javascript: dl(&quot;" title="[1.8 MB] [Jan 06, 2018]">kwant‑1.3.2‑cp34‑cp34m‑win32.whl</a></li> <li><a href="javascript:;" onclick=" javascript:dl([101,51,110,97,105,113,45,47,104,108,99,112,46,100,109,55,53,95,119,49,106,50,107,54,118,116,52,115], &quot;JDC4:G?H69:D&gt;6EA21H5B;B;059:D&gt;59:D&gt;=5A31@2=&lt;FI;A78&quot;); &quot;javascript: dl(&quot;" title="[13.5 MB] [May 15, 2017]">kwant‑1.1.3‑cp27‑cp27m‑win_amd64.whl</a></li> <li><a href="javascript:;" onclick=" javascript:dl([101,55,113,108,46,104,107,106,49,115,118,99,50,119,47,45,51,97,116,110,53,109,112,105], &quot;8;61E9CA=:E;0=5&lt;@BA&gt;7373?&gt;:E;0&gt;:E;0D&gt;&lt;FB?;3&lt;42&quot;); &quot;javascript: dl(&quot;" title="[6.7 MB] [May 15, 2017]">kwant‑1.1.3‑cp27‑cp27m‑win32.whl</a></li> <li><a href="javascript:;" onclick=" javascript:dl([101,113,99,45,52,119,95,49,111,100,116,48,46,47,118,55,97,112,108,107,115,54,110,105,50,101,104,53,106,109], &quot;CGK0@=J9&lt;1@G&gt;&lt;B4?E926;:;J21@G&gt;2E7EH24FE5?L8D3;4IA&quot;); &quot;javascript: dl(&quot;" title="[13.4 MB] [Sep 11, 2015]">kwant‑1.0.5‑cp27‑none‑win_amd64.whl</a></li> <li><a href="javascript:;" onclick=" javascript:dl([101,106,47,105,101,113,111,97,108,50,107,45,112,104,118,48,49,110,119,99,115,53,51,46,55,116], &quot;C804;=DH1B;8G19A6@H:?F&gt;FD:B;8G:@5@3:A2@E8FA&lt;7&quot;); &quot;javascript: dl(&quot;" title="[6.7 MB] [Sep 11, 2015]">kwant‑1.0.5‑cp27‑none‑win32.whl</a></li> </ul> </li> <li><a id="la"></a><strong><a href="https://github.com/kwgoodman/la">La</a></strong>: aka larry, the labeled numpy array. <ul> <li><a href="javascript:;" onclick=" javascript:dl([101,97,109,99,108,48,51,46,110,54,50,105,47,95,53,104,113,55,45,100,112,118,52,101,115,106,116,119], &quot;G9H?CD=I;2C5=;30A46@646BFD4A2C5=A2C5=1AJ:7&lt;01B8E6J&gt;3&quot;); &quot;javascript: dl(&quot;" title="[139 KB] [Apr 11, 2016]">la‑0.7.0.dev0‑cp35‑cp35m‑win_amd64.whl</a></li> <li><a href="javascript:;" onclick=" javascript:dl([101,108,55,110,99,112,48,47,116,106,109,115,119,118,100,53,51,104,45,105,113,46,101,50,97], &quot;:F8C4&lt;&gt;7634?&gt;60GA5D1D5D=E&lt;5A34?&gt;A34?&gt;9A;B2?FD;@0&quot;); &quot;javascript: dl(&quot;" title="[137 KB] [Apr 11, 2016]">la‑0.7.0.dev0‑cp35‑cp35m‑win32.whl</a></li> <li><a href="javascript:;" onclick=" javascript:dl([101,109,55,118,110,115,50,106,104,113,116,108,53,97,100,54,101,47,52,105,112,46,95,51,45,99,119,48], &quot;4568C2;9@HCFA@:&lt;GJD1DJD=?2JGHCFAGHCFA0GIB3E&lt;0=&gt;ADI7:&quot;); &quot;javascript: dl(&quot;" title="[137 KB] [Apr 11, 2016]">la‑0.7.0.dev0‑cp34‑cp34m‑win_amd64.whl</a></li> <li><a href="javascript:;" onclick=" javascript:dl([101,105,48,55,112,101,52,53,99,51,115,50,109,97,47,110,118,113,106,116,108,46,119,104,100,45], &quot;9:A@3?6B=7385=C&lt;H1D2D1DG4?1H7385H7385;HE0&gt;8:DEFC&quot;); &quot;javascript: dl(&quot;" title="[136 KB] [Apr 11, 2016]">la‑0.7.0.dev0‑cp34‑cp34m‑win32.whl</a></li> <li><a href="javascript:;" onclick=" javascript:dl([101,45,105,55,100,116,97,99,106,113,101,95,54,50,110,104,47,115,119,118,109,108,52,46,53,48,112], &quot;@&lt;78IBG4?6I&lt;2?D50HF2FHF39BH06I&lt;206I&lt;2C0A1=:5C3;EFA&gt;D&quot;); &quot;javascript: dl(&quot;" title="[137 KB] [Apr 11, 2016]">la‑0.7.0.dev0‑cp27‑cp27m‑win_amd64.whl</a></li> <li><a href="javascript:;" onclick=" javascript:dl([101,119,55,53,48,105,118,112,47,115,45,108,116,97,113,104,99,100,106,51,110,109,46,50,101], &quot;8FA=652;7?6F17:&lt;93E1E3E@G539?6F19?6F1D904CBFE0&gt;:&quot;); &quot;javascript: dl(&quot;" title="[136 KB] [Apr 11, 2016]">la‑0.7.0.dev0‑cp27‑cp27m‑win32.whl</a></li> </ul> </li> ''' print('html done') #html.decode('utf-8') #print(html) '''headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1)AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11'} r = requests.get(url, headers = headers) r.encoding = "utf-8" soup = BeautifulSoup(r.text, "html.parser") #html_mod=re.sub(pattern=".",repl=".",string=html.decode('utf-8')) for link in soup.find_all('a'): #soup.find_all返回的为列表 print(link.get('href')) #name_list+=link ''' name_list = html#soup.find_all('a')#re.findall(r']">*-cp38-win_amd64.whl',html.decode('utf-8')) x=1 files=os.listdir(save_path) print(files) print(type(name_list)) name_list=str(name_list) name_list1=[] #print(name_list) #for name in name_list: k=0 # name[k]=str(name1[k]) for i in range(len(name_list)): j=0 if name_list[i-2:i+1]==']">': name_list1.append(name_list[i+1:i+60]) global m if k<len(name_list1): for l in range(len(name_list1[k])): if l-9>=0: if name_list1[k][l-4:l]=='.whl' or name_list1[k][l-3:l]=='.gz' or name_list1[k][l-4:l]=='.zip': j=1 m=l if j==1: name_list1[k]=name_list1[k][0:m] k+=1 '''if j==0: name_list.remove(name)''' #file_name = os.path.join(save_path ,name) i=0 #print(name) print(name_list1) for name in name_list1: j=0 for l in range(len(name)): if l-9>=0: if name[l-4:l]=='.whl' or name[l-3:l]=='.gz' or name[l-4:l]=='.zip': j=1 m=l if j==1: name=name[0:m] k+=1 if name in files: continue '''if name=='Delny‑0.4.1‑cp27‑none‑win_amd64.whl</a></li>\n<li>' or name==Delny‑0.4.1‑cp27‑none‑win32.whl</a></li> </ul> </: continue ''' print('no:'+str(x)) print('\ndownload '+name) # importlib.reload(sys) #imp.reload(sys) for l in range(len(name)): if l-9>=0: if name[l-4:l]=='.whl' or name[l-3:l]=='.gz' or name[l-4:l]=='.zip': j=1 m=l if j==1: name=name[0:m] k+=1 string='https://download.lfd.uci.edu/pythonlibs/s2jqpv5t/' + name#[0:4+name.find('.whl')]#https://download.lfd.uci.edu/pythonlibs/s2jqpv5t/ print('00'+save_path) count=0 v=0 for p in range(len(string)): if string[p]=='\\': if v==0: string=string[:6]+'//'+string[7:] else: string=string[:p]+'/'+string[p+1:] v+=1 if string[p-3:p]=='win': string=string[:p-4]+'-'+string[p-3:] if p<len(string): if (string[p]=='\u2011')==True: if p+1<len(string): string=string[:p]+'-'+string[p+1:] '''if string[p-2]>='0' and string[p-2]<='9' and string[p-1]>='0' and string[p-1]<='9': if (string[p]>='a'and string[p]<='z') or (string[p]>='A'and string[p]<='Z'): string=string[:p]+string[p+1:]''' if p>=len(string): break '''if name[:9]=='ad3‑2.2.1': print('aaa') continue''' conf={'url':string} d=Downloader(conf) d.start() #file(string,save_path,name) x=x+1 print('09'+name_list) print('finished') if __name__ == '__main__': main() ``` 求高手解决

python爬新浪新闻内容,为什么运行完stock里面为空……

#! /usr/bin/env python #coding=utf-8 from scrapy.selector import Selector from scrapy.http import Request import re,os from bs4 import BeautifulSoup from scrapy.spider import Spider import urllib2,thread #处理编码问题 import sys reload(sys) sys.setdefaultencoding('gb18030') #flag的作用是保证第一次爬取的时候不进行单个新闻页面内容的爬取 flag=1 projectpath='C:\\Users\DELL\\Desktop\\pythonproject\\mypro\\' def loop(*response): sel = Selector(response[0]) #get title title = sel.xpath('//h1/text()').extract() #get pages pages=sel.xpath('//div[@id="artibody"]//p/text()').extract() #get chanel_id & comment_id s=sel.xpath('//meta[@name="comment"]').extract() #comment_id = channel[index+3:index+15] index2=len(response[0].url) news_id=response[0].url[index2-14:index2-6] comment_id='31-1-'+news_id #评论内容都在这个list中 cmntlist=[] page=1 #含有新闻url,标题,内容,评论的文件 file2=None #该变量的作用是当某新闻下存在非手机用户评论时置为False is_all_tel=True while((page==1) or (cmntlist != [])): tel_count=0 #each page tel_user_count #提取到的评论url url="http://comment5.news.sina.com.cn/page/info?version=1&format=js&channel=cj&newsid="+str(comment_id)+"&group=0&compress=1&ie=gbk&oe=gbk&page="+str(page)+"&page_size=100" url_contain=urllib2.urlopen(url).read() b='={' after = url_contain[url_contain.index(b)+len(b)-1:] #字符串中的None对应python中的null,不然执行eval时会出错 after=after.replace('null','None') #转换为字典变量text text=eval(after) if 'cmntlist' in text['result']: cmntlist=text['result']['cmntlist'] else: cmntlist=[] if cmntlist != [] and (page==1): filename=str(comment_id)+'.txt' path=projectpath+'stock\\' +filename file2=open(path,'a+') news_content=str('') for p in pages: news_content=news_content+p+'\n' item="<url>"+response[0].url+"</url>"+'\n\n'+"<title>"+str(title[0])+"</title>\n\n"+"<content>\n"+str(news_content)+"</content>\n\n<comment>\n" file2.write(item) if cmntlist != []: content='' for status_dic in cmntlist: if status_dic['uid']!='0': is_all_tel=False #这一句视编码情况而定,在这里去掉decode和encode也行 s=status_dic['content'].decode('UTF-8').encode('GBK') #见另一篇博客“三张图” s=s.replace("'",'"') s=s.replace("\n",'') s1="u'"+s+"'" try: ss=eval(s1) except: try: s1='u"'+s+'"' ss=eval(s1) except: return content=content+status_dic['time']+'\t'+status_dic['uid']+'\t'+ss+'\n' #当属于手机用户时 else: tel_count=tel_count+1 #当一个page下不都是手机用户时,这里也可以用is_all_tel进行判断,一种是用开关的方式,一种是统计的方式 #算了不改了 if tel_count!=len(cmntlist): file2.write(content) page=page+1 #while loop end here if file2!=None: #当都是手机用户时,移除文件,否则写入"</comment>"到文件尾 if is_all_tel: file2.close() try: os.remove(file2.name) except WindowsError: pass else: file2.write("</comment>") file2.close() class DmozSpider(Spider): name = "stock" allowed_domains = ["sina.com.cn"] #在本程序中,start_urls并不重要,因为并没有解析 start_urls = [ "http://news.sina.com.cn/" ] global projectpath if os.path.exists(projectpath+'stock'): pass else: os.mkdir(projectpath+'stock') def parse(self, response): #这个scrapy.selector.Selector是个不错的处理字符串的类,python对编码很严格,它却处理得很好 #在做这个爬虫的时候,碰到很多奇奇怪怪的编码问题,主要是中文,试过很多既有的类,BeautifulSoup处理得也不是很好 sel = Selector(response) global flag if(flag==1): flag=2 page=1 while page<260: url="http://roll.finance.sina.com.cn/finance/zq1/index_" url=url+str(page)+".shtml" #伪装为浏览器 user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' headers = { 'User-Agent' : user_agent } req = urllib2.Request(url, headers=headers) response = urllib2.urlopen(req) url_contain = response.read() #利用BeautifulSoup进行文档解析 soup = BeautifulSoup(url_contain) params = soup.findAll('div',{'class':'listBlk'}) if os.path.exists(projectpath+'stock\\'+'link'): pass else: os.mkdir(projectpath+'stock\\'+'link') filename='link.txt' path=projectpath+'stock\\link\\' + filename filelink=open(path,'a+') for params_item in params: persons = params_item.findAll('li') for item in persons: href=item.find('a') mil_link= href.get('href') filelink.write(str(mil_link)+'\n') #递归调用parse,传入新的爬取url yield Request(mil_link, callback=self.parse) page=page+1 #对单个新闻页面新建线程进行爬取 if flag!=1: if (response.status != 404) and (response.status != 502): thread.start_new_thread(loop,(response,))

小白用python编写的爬虫小程序突然失效,是ip被封还是其他问题,求教?

# 编写的python小程序,爬取豆瓣评论,昨天还可以用,今天就失效了,试过很多种解决方法,都没有成功,求教? ## 可能的问题是ip被封或者cookies? 主程序 ``` # -*- coding: utf-8 -*- import ReviewCollection from snownlp import SnowNLP from matplotlib import pyplot as plt #画饼状图 def PlotPie(ratio, labels, colors): plt.figure(figsize=(6, 8)) explode = (0.05,0) patches,l_text,p_text = plt.pie(ratio,explode=explode,labels=labels,colors=colors, labeldistance=1.1,autopct='%3.1f%%',shadow=False, startangle=90,pctdistance=0.6) plt.axis('equal') plt.legend() plt.show() def main(): #初始url url = 'https://movie.douban.com/subject/30176393/' #保存评论文件 outfile = 'review.txt' (reviews, sentiment) = ReviewCollection.CollectReivew(url, 20, outfile) numOfRevs = len(sentiment) print(numOfRevs) #print(sentiment) positive = 0.0 negative = 0.0 accuracy = 0.0 #利用snownlp逐条分析每个评论的情感 for i in range(numOfRevs): # if sentiment[i] == 1: # positive += 1 # else: # negative += 1 print(reviews[i]+str(i)) sent = SnowNLP(reviews[i]) predict = sent.sentiments #print(predict,end=' ') if predict >= 0.5: positive += 1 if sentiment[i] == 1: accuracy += 1 else: negative += 1 if sentiment[i] == 0: accuracy += 1 #计算情感分析的精度 print('情感预测精度为: ' + str(accuracy/numOfRevs)) # print(positive,negative) #绘制饼状图 #定义饼状图的标签 labels = ['Positive Reviews', 'Negetive Reviews'] #每个标签占的百分比 ratio = [positive/numOfRevs, negative/numOfRevs] # print(ratio[0],ratio[1]) colors = ['red','yellowgreen'] PlotPie(ratio, labels, colors) if __name__=="__main__": main() ``` 次程序 ``` #!/usr/bin/python # -*- coding: utf-8 -*- from bs4 import BeautifulSoup import requests import csv import re import time import codecs import random def StartoSentiment(star): ''' 将评分转换为情感标签,简单起见 我们将大于或等于三星的评论当做正面评论 小于三星的评论当做负面评论 ''' score = int(star[-2]) if score >= 3: return 1 else: return 0 def CollectReivew(root, n, outfile): ''' 收集给定电影url的前n条评论 ''' reviews = [] sentiment = [] urlnumber = 0 headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36','Connection': 'close','cookie': 'll="108303"; bid=DOSjemTnbi0; _pk_ses.100001.4cf6=*; ap_v=0,6.0; __utma=30149280.1517093765.1576143949.1576143949.1576143949.1; __utmb=30149280.0.10.1576143949; __utmc=30149280; __utmz=30149280.1576143949.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utma=223695111.1844590374.1576143949.1576143949.1576143949.1; __utmc=223695111; __utmz=223695111.1576143949.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utmt=1; __yadk_uid=iooXpNnGnHUza2r4ru7uRCpa3BXeHG0l; dbcl2="207917948:BFXaC6risAw"; ck=uFvj; _pk_id.100001.4cf6=4c11da64dc6451d3.1576143947.1.1576143971.1576143947.; __utmb=223695111.2.10.1576143949'} proxies = { "http":'http://121.69.46.177:9000',"https": 'https://122.136.212.132:53281'}#121.69.46.177:9000218.27.136.169:8085 122.136.212.132:53281 while urlnumber < n: url = root + 'comments?start=' + str(urlnumber) + '&limit=20&sort=new_score&status=P' print('要收集的电影评论网页为:' + url) # try: html = requests.get(url, headers = headers, proxies = proxies,timeout = 15) # # except Exception as e: # break soup = BeautifulSoup(html.text.encode("utf-8"),'html.parser') #通过正则表达式匹配评论和评分 for item in soup.find_all(name='span',attrs={'class':re.compile(r'^allstar')}): sentiment.append(StartoSentiment(item['class'][0])) #for item in soup.find_all(name='p',attrs={'class':''}): # if str(item).find('class="pl"') < 0: # r = str(item.string).strip() # reviews.append(r) comments = soup.find_all('span','short') for comment in comments: # print(comment.getText()+'\n') reviews.append(comment.getText()+'\n') urlnumber = urlnumber + 20 time.sleep(5) with codecs.open(outfile, 'w', 'utf-8') as output: for i in range(len(sentiment)): output.write(reviews[i] + '\t' + str(sentiment[i]) + '\n') return (reviews, sentiment) ``` ![图片说明](https://img-ask.csdn.net/upload/201912/12/1576149313_611712.jpg) 不设置参数proxies时错误如下:![图片说明](https://img-ask.csdn.net/upload/201912/12/1576149408_985833.jpg) 求教解决方法,感谢!!!!

python3 用pdfminer3k爬取PDF文件不完整,请问有什么解决方法吗?

最近在用Python爬交易所公告的PDF文件,参考了论坛上各位大神的介绍,安装了pdfminer3k,并成功解析了PDF文件。不过我发现有些PDF文件解析的时候只能解析一部分内容出来,大段的文字没有解析出来,请问是什么问题,有什么解决方案吗?查了好久没找到类似的问题,感谢大家! 下面是我的代码: # -*- coding: utf-8 -*- from urllib.request import Request from urllib.request import quote from urllib.request import urlopen import pandas as pd from pdfminer.converter import PDFPageAggregator from pdfminer.layout import LTTextBoxHorizontal, LAParams from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter from pdfminer.pdfinterp import PDFTextExtractionNotAllowed from pdfminer.pdfparser import PDFParser, PDFDocument headers = {'content-type': 'application/json', 'Accept-Encoding': 'gzip, deflate', 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:53.0) Gecko/20100101 Firefox/53.0'} baseurl = "http://" def parse(docucode, txtcode): try: # 打开在线PDF文档 #_path = baseurl + quote(docucode) + "?random=0.3006649122149502" _path = baseurl + quote(docucode) request = Request(url=_path, headers=headers) # 随机从user_agent列表中抽取一个元素 fp = urlopen(request,timeout=500) #timeout设置超时的时间,防止出现访问超时问题 # 读取本地文件 # path = './2015.pdf' # fp = open(path, 'rb') # 用文件对象来创建一个pdf文档分析器 praser_pdf = PDFParser(fp) # 创建一个PDF文档 doc = PDFDocument() # 连接分析器 与文档对象 praser_pdf.set_document(doc) doc.set_parser(praser_pdf) # 提供初始化密码doc.initialize("123456") # 如果没有密码 就创建一个空的字符串 doc.initialize() # 检测文档是否提供txt转换,不提供就忽略 if not doc.is_extractable: raise PDFTextExtractionNotAllowed else: # 创建PDf资源管理器 来管理共享资源 rsrcmgr = PDFResourceManager() # 创建一个PDF参数分析器 laparams = LAParams() # 创建聚合器 device = PDFPageAggregator(rsrcmgr, laparams=laparams) # 创建一个PDF页面解释器对象 interpreter = PDFPageInterpreter(rsrcmgr, device) # 循环遍历列表,每次处理一页的内容 # doc.get_pages() 获取page列表 for page in doc.get_pages(): # 使用页面解释器来读取 interpreter.process_page(page) # 使用聚合器获取内容 layout = device.get_result() # 这里layout是一个LTPage对象 里面存放着 这个page解析出的各种对象 一般包括LTTextBox, # LTFigure, LTImage, LTTextBoxHorizontal 等等 想要获取文本就获得对象的text属性, for out in layout: # 判断是否含有get_text()方法,图片之类的就没有 # if ``hasattr(out,"get_text"): docname = str(txtcode).split('.')[0]+'.txt' with open(docname, 'a') as f: if isinstance(out, LTTextBoxHorizontal): results = out.get_text() #print(results) f.write(results) except Exception as e: #抛出超时异常 print("a", str(e)) pdfurl = 'www.sse.com.cn/disclosure/credibility/supervision/inquiries/opinion/c/8135857143683813.pdf' txtname = 'ceshi' parse(pdfurl, txtname)

求助,python 报错:AttributeError: module 'log0' has no attribute 'out'怎么办?

python代码: ``` from downloader import Downloader #, cStringIO, cPickle from threading import Thread from time import sleep import log0 as log from os.path import basename import requests as req import pickle from os.path import exists db='E:/tmp/download.data' def append(obj): try: if exists(db): with open(db,'rb') as f: data=pickle.load(f) else: data={} except: data={} data[obj['url']]=obj with open(db,'wb') as f: pickle.dump(data,f) def load(url): if not exists(db): return None try: with open(db,'rb') as f: data=pickle.load(f) return data.get(url) except: return None def out(msg): print(msg) import time from os.path import basename, exists, getsize from queue import Queue from threading import Lock, Thread, current_thread import requests as req import random as rand import conf class Downloader: KB=1024 MB=KB*KB GB=KB*MB range_size=MB max_workers=10 spd_refresh_interval=1 user_agents=[ 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.246', 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1', 'Mozilla/5.0 (Windows NT 6.4; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2225.0 Safari/537.36' 'Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.93 Safari/537.36', 'Mozilla/5.0 (Windows NT 6.3; rv:36.0) Gecko/20100101 Firefox/36.0' ] chunk_size=KB max_error=0.1 #单线程允许最大出错率 max_error_one_worker=0.5 #仅剩一个线程时允许的最大出错率 home='E:/tmp/' #下载目录 def __init__(self,c): self.__locks={i:Lock() for i in ('file','worker_info','itr_job','download_info')} self.__config=c self.__alive=False self.__fails=Queue() self.__conf=c c=conf.load(c['url']) if c: self.__conf=c self.__init_from_conf() else: self.__init_task() def __init_from_conf(self): self.__download_offset=self.__conf['offset'] for i in self.__conf['fails']: self.__fails.put(i) def __get_agent(self): return self.user_agents[rand.randint(0,len(self.user_agents)-1)] def __init_task(self): headers={'Range':'bytes=0-0'} headers['User-Agent']=self.__get_agent() print(headers) try: r=req.get(self.__conf['url'],headers=headers,stream=True) self.__conf['name'] = basename(self.__conf['url']) or str(int(round(time.time()*1000))) self.__conf['206'] = r.status_code == 206 or r.headers.get('Accept-Ranges')=='bytes' if self.__conf['206']: self.__conf['len']=int(r.headers['Content-Range'].split('/')[-1]) elif r.status_code!=200: log.out('init task err') return else: self.__conf['len']=int(r.headers['Content-Length']) r.close() self.__download_offset=0 self.__conf['init']=True except Exception as e: log.out(e) def __itr_job(self): if self.__locks['itr_job'].acquire(): if not self.__fails.empty(): ans=self.__fails.get() elif self.__download_offset<self.__conf['len']: o=self.__download_offset ans=(o,min(self.__conf['len']-1,o+self.range_size-1)) self.__download_offset+=self.range_size else: ans=(-1,-1) self.__locks['itr_job'].release() return ans def __has_job(self): if self.__locks['itr_job'].acquire(): ans=self.__download_offset<self.__conf['len'] or not self.__fails.empty() self.__locks['itr_job'].release() return ans def __download_no_206(self): headers={'User-Agent':self.__get_agent()} r=req.get(self.__conf['url'],headers=headers,stream=True) self.__download_offset=0 if r.status_code != 200: r.close() self.__stopped() return try: for con in r.iter_content(chunk_size=self.chunk_size): if self.__kill_signal: break self.__file.write(con) l=len(con) self.__down_bytes+=l self.__download_offset+=l t0=time.time() t=t0-self.__last_time if t>=self.spd_refresh_interval: self.__down_spd=self.__down_bytes/t log.out('downspd: %d KB/s'%(self.__down_spd/self.KB)) self.__last_time=t0 self.__down_bytes=0 except: pass r.close() self.__stopped() def __download_206(self): file_len=self.__conf['len'] total=0 error=0 kill=False with req.session() as sess: while True: s,e=self.__itr_job() if s==-1: log.out('no job stop') break headers={'Range':'bytes=%d-%d'%(s,e)} headers['User-Agent']=self.__get_agent() try: r=sess.get(self.__conf['url'],headers=headers,stream=True) total+=1 if r.status_code!=206: self.__fails.put((s,e)) error+=1 if error>self.max_error*total: if self.__locks['worker_info'].acquire(): num=self.__current_workers self.__locks['worker_info'].release() if error>self.max_error_one_worker*total or num>1: break continue for con in r.iter_content(chunk_size=self.chunk_size): if self.__locks['worker_info'].acquire(): if self.__kill_signal: self.__locks['worker_info'].release() kill=True break self.__locks['worker_info'].release() if self.__locks['file'].acquire(): self.__file.seek(s) self.__file.write(con) l=len(con) s+=l self.__locks['file'].release() if self.__locks['download_info'].acquire(): self.__down_bytes+=l t0=time.time() t=t0-self.__last_time if t>=self.spd_refresh_interval: log.out('downspd: %d KB/s'%(self.__down_spd/self.KB)) self.__down_spd=self.__down_bytes/t self.__down_bytes=0 self.__last_time=t0 self.__locks['download_info'].release() if s<=e and s<file_len: self.__fails.put((s,e)) if kill: break except : self.__fails.put((s,e)) error+=1 if error>self.max_error*total: if self.__locks['worker_info'].acquire(): num=self.__current_workers self.__locks['worker_info'].release() if error>self.max_error_one_worker*total or num>1: break self.__stopped() def __start_worker(self,target): if self.__locks['worker_info'].acquire(): if self.__kill_signal: self.__locks['worker_info'].release() return False if self.__current_workers<self.max_workers: Thread(target=target).start() self.__current_workers+=1 log.out('new worker started,current workers %d'%self.__current_workers) self.__locks['worker_info'].release() return True def __start_workers(self): for _ in range(self.max_workers): if not self.__start_worker(self.__download_206): break time.sleep(0.8) def start(self): if self.__alive: log.out('already started!') return if self.__conf.get('status')=='done': log.out('already done') return self.__alive=True self.__kill_signal=False self.__conf['status']='working' self.__down_bytes=0 self.__down_spd=0 self.__last_time=0 self.__current_workers=0 self.__start_time=time.time() try: path=self.home+self.__conf['name'] self.__file=open(path,(exists(path) and 'rb+') or 'wb' ) if not self.__conf['206']: Thread(target=self.__start_workers).start() else: self.__start_worker(self.__download_no_206) log.out('starting done!') except: log.out('starting failed') def stop(self): if self.__kill_signal: return log.out('stopping') if self.__locks['worker_info'].acquire(): self.__kill_signal=True if self.__conf['status']=='working': self.__conf['status']='stopped' self.__locks['worker_info'].release() def __after_stopped(self): if not self.__kill_signal: self.__kill_signal=True __alive=False self.__file.close() log.out('total time: %.2f'%(time.time()-self.__start_time)) self.__conf['offset']=self.__download_offset if not self.__has_job(): self.__conf['status']='done' elif self.__conf.get('status')!='stopped': self.__conf['status']='error' leak=0 ls=[] while not self.__fails.empty(): i=self.__fails.get() leak+=i[1]-i[0]+1 ls.append(i) self.__conf['fails']=ls leak+=max(self.__conf['len']-self.__download_offset,0) log.out('total leak: %d'%leak) conf.append(self.__conf) def __stopped(self): if self.__locks['worker_info'].acquire(): self.__current_workers-=1 log.out('%s stopped'%current_thread().name) if self.__current_workers==0: self.__after_stopped() self.__locks['worker_info'].release() #!/usr/bin/env python # coding=utf-8 #import importlib,sys #import sys #sys.setdefaultencoding('gbk') '''import sys import imp import sys reload(sys) sys.setdefaultencoding('utf8') ''' ''' import sys sys.setdefaultencoding('utf-8') import jieba import json''' def main(): from bs4 import BeautifulSoup import urllib.request import urllib.parse as parse import ssl import re import os,os.path import codecs import requests def getHtml(url): global html page = urllib.request.urlopen(url) html = page.read() return html def file(url1,file_name,name): print(url1) #file(name,save_path,filename) #url1= +'/' + filename url1=url1.encode() #file = open(name ,'wb+') #file.write(url1 ) #file.close() #print(file_name) headers = {'Host': 'https://files.pythonhosted.org/packages/','User-Agent':'Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36 LBBROWSER','Referer': 'https://pypi.org/', 'Connection': 'keep-alive', 'Upgrade-Insecure-Requests': '1', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36 LBBROWSER', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 'Accept-Encoding': 'gzip, deflate, sdch, br', 'Accept-Language': 'zh-CN,zh;q=0.8'} #req = urllib.urlretrieve(download_url,headers=headers) #urllib.request.urlopen('https://www.lfd.uci.edu/~gohlke/pythonlibs/') #req = urllib.request.Request(url=url,headers=header) #request = urllib.request.urlopen(url1) #response = urllib.request.urlopen(request) import socket import urllib.request #设置超时时间为30s socket.setdefaulttimeout(5) #解决下载不完全问题且避免陷入死循环 '''try: urllib.request.urlretrieve(url1.decode(),name) except socket.timeout:''' count = 1 while count <= 1: import time # 格式化成2016-03-20 11:45:39形式 print(time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())) # 格式化成Sat Mar 28 22:24:24 2016形式 print(time.strftime("%a %b %d %H:%M:%S %Y", time.localtime())) # 将格式字符串转换为时间戳 a = "Sat Mar 28 22:24:24 2016" print(time.mktime(time.strptime(a,"%a %b %d %H:%M:%S %Y"))) try: urllib.request.urlretrieve(url1.decode(),name) print('\nchangshi'+str(count)+'over\n') break except socket.timeout: err_info = 'Reloading for %d time'%count if count == 1 else 'Reloading for %d times'%count print(err_info) count += 1 except urllib.error.HTTPError: print('urllib.error.HTTPError') except urllib.error.URLError: print('urllib.error.URLError') except ssl.SSLWantReadError: print('ssl.SSLWantReadError') if count > 1: print("downloading picture fialed!") #urllib.request.urlretrieve(url1.decode(),name) global i i += 1 print(url1.decode()) #file = open(name ,'wt+') #file.write(str(req.content())) #file.close() print(file_name) global x print("Completed : .... %d ..." % x) '''for i in range(len(name_list)): j=0 if name_list[i-24:i+1]=='https://pypi.org/project/': name_list1.append(name_list[i+1:i+60])''' print('\n........'+name+'..........complete\n') '''headers = {'Host': 'download.lfd.uci.edu','User-Agent':'Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36 LBBROWSER','Referer': 'https://www.lfd.uci.edu/~gohlke/pythonlibs/', 'Connection': 'keep-alive', 'Upgrade-Insecure-Requests': '1', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36 LBBROWSER', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 'Accept-Encoding': 'gzip, deflate, sdch, br', 'Accept-Language': 'zh-CN,zh;q=0.8'} #req = urllib.urlretrieve(download_url,headers=headers) #urllib.request.urlopen('https://www.lfd.uci.edu/~gohlke/pythonlibs/') #req = urllib.request.Request(url=url,headers=header) request = requests.get(url=url1,headers=headers) #response = urllib.request.urlopen(request) global i i += 1 file = open(name ,'wb+') file.write(request.content) file.close() print(file_name) print("Completed : .... %d ..." % x)''' save_path = os.getcwd() url = 'https://www.lfd.uci.edu/' html = getHtml(url) html=''' </li> <li><a id="imagecodecs-lite"></a><strong><a href="https://www.lfd.uci.edu/~gohlke/#python">Imagecodecs-lite</a></strong> (deprecated): a subset of <a href="https://www.lfd.uci.edu/~gohlke/pythonlibs/#imagecodecs">imagecodecs</a>. <ul> <li><a href="javascript:;" onclick=" javascript:dl([101,99,106,112,118,103,115,49,47,119,116,45,104,111,95,51,48,108,105,50,53,101,113,109,97,46,110,121,100], &quot;5B1E23C97AFG4D0&lt;KD05=@A9D:B?B?H6H&gt;6:2J&gt;:I&lt;ID:GIJH8;@&quot;); &quot;javascript: dl(&quot;" title="[1 KB] [Feb 17, 2020]">imagecodecs_lite‑2020.1.31‑py3‑none‑any.whl</a></li> <li><a href="javascript:;" onclick=" javascript:dl([101,97,51,110,111,49,45,116,106,101,99,113,105,119,50,108,95,115,48,52,100,118,56,53,47,54,112,103,104,46,57,109], &quot;@=7:IDF6G;N0J893C89@?&gt;;685=A4ML4=L159I1E59I1E5&lt;;2?0NCHBL&lt;K&gt;&quot;); &quot;javascript: dl(&quot;" title="[148 KB] [Dec 04, 2019]">imagecodecs_lite‑2019.12.3‑cp38‑cp38‑win_amd64.whl</a></li> <li><a href="javascript:;" onclick=" javascript:dl([101,49,47,119,100,46,48,110,99,115,50,104,45,57,51,111,108,113,97,56,112,95,106,109,103,116,105,101,53,118], &quot;89E@CLKH1IFAGJ7&gt;3J78D?IHJ;950&lt;4094=;7C=B;7C=B;2I6=942:?&quot;); &quot;javascript: dl(&quot;" title="[120 KB] [Dec 04, 2019]">imagecodecs_lite‑2019.12.3‑cp38‑cp38‑win32.whl</a></li> <li><a href="javascript:;" onclick=" javascript:dl([101,99,112,108,105,119,50,109,103,113,110,45,57,55,48,49,115,118,47,100,111,53,51,95,97,104,101,116,54,46,106,52], &quot;?5M81@DJA36G7I0CBI0?F23JI:5=&gt;;L&gt;5LE:01E&lt;:01E&lt;6:439FG6BKNL4H2&quot;); &quot;javascript: dl(&quot;" title="[145 KB] [Dec 04, 2019]">imagecodecs_lite‑2019.12.3‑cp37‑cp37m‑win_amd64.whl</a></li> <li><a href="javascript:;" onclick=" javascript:dl([101,57,47,55,112,101,49,105,106,115,95,99,51,116,50,110,113,53,45,48,108,97,46,118,119,109,111,104,100,103], &quot;8=7?3F@&lt;16HDL4:IK4:89C6&lt;4A=B50E5=E;A:3;2A:3;2HAG6&gt;;=EGJC&quot;); &quot;javascript: dl(&quot;" title="[118 KB] [Dec 04, 2019]">imagecodecs_lite‑2019.12.3‑cp37‑cp37m‑win32.whl</a></li> <li><a href="javascript:;" onclick=" javascript:dl([101,115,119,50,54,47,111,112,101,48,105,103,110,100,53,109,99,45,46,97,106,51,52,118,104,95,49,116,108,113,57], &quot;02CL6F=J49&gt;B:7?5&lt;7?0HK9J7@28IMAI2AD@?6D3@?6D3&gt;@19;HB&gt;&lt;3EA1GK&quot;); &quot;javascript: dl(&quot;" title="[137 KB] [Dec 04, 2019]">imagecodecs_lite‑2019.12.3‑cp36‑cp36m‑win_amd64.whl</a></li> <li><a href="javascript:;" onclick=" javascript:dl([101,50,110,116,51,97,111,104,45,95,57,103,101,99,119,115,105,118,54,108,113,112,100,109,106,49,47,46,53,48], &quot;&gt;0GCD@K2I?F4:;&lt;5E;&lt;&gt;8B?2;70LH9JH0J37&lt;D3A7&lt;D3AF7=?130J=6B&quot;); &quot;javascript: dl(&quot;" title="[112 KB] [Dec 04, 2019]">imagecodecs_lite‑2019.12.3‑cp36‑cp36m‑win32.whl</a></li> <li><a href="javascript:;" onclick=" javascript:dl([101,53,49,51,113,100,108,47,52,118,54,46,106,105,109,99,57,112,103,97,101,110,115,48,95,104,119,50,45,116,111], &quot;EJ;3@80L6&gt;@206&lt;=BAC&gt;M4C&gt;EG5&lt;LCKJF1?:1J:2K&gt;@20K&gt;@20=KI&lt;DGB=497:IH5&quot;); &quot;javascript: dl(&quot;" title="[133 KB] [Dec 04, 2019]">imagecodecs_lite‑2019.12.3‑cp35‑cp35m‑win_amd64.whl</a></li> <li><a href="javascript:;" onclick=" javascript:dl([101,46,106,118,49,101,51,109,119,108,47,104,116,53,99,113,50,105,45,100,57,111,112,48,115,95,110,97,103], &quot;G?1&gt;E2&lt;;9=E5&lt;9@6JK4=DB4=GH8@;4A?F3C03?05A=E5&lt;A=E5&lt;6A7@I5?07:8&quot;); &quot;javascript: dl(&quot;" title="[110 KB] [Dec 04, 2019]">imagecodecs_lite‑2019.12.3‑cp35‑cp35m‑win32.whl</a></li> <li><a href="javascript:;" onclick=" javascript:dl([101,54,118,48,53,115,45,112,116,110,106,51,46,100,108,99,52,109,55,50,105,47,49,104,97,119,113,103,111,101,95,57], &quot;4B9I6137D&gt;6BADC@GJL&gt;K&lt;L&gt;4M=C7L5B2EN;EB;:5&gt;6BA5&gt;6BA@5HC8MG@&lt;0?;HF=&quot;); &quot;javascript: dl(&quot;" title="[145 KB] [Dec 04, 2019]">imagecodecs_lite‑2019.12.3‑cp27‑cp27m‑win_amd64.whl</a></li> <li><a href="javascript:;" onclick=" javascript:dl([101,115,51,118,57,104,112,50,110,48,103,109,55,95,49,100,108,116,101,106,45,113,53,46,47,111,99,119,97,105], &quot;06BD52E@GI56;GL:K9AIH&gt;AI0&lt;?L@AC68=3F=6F1CI56;CI56;:CJL716FJ4?&quot;); &quot;javascript: dl(&quot;" title="[120 KB] [Dec 04, 2019]">imagecodecs_lite‑2019.12.3‑cp27‑cp27m‑win32.whl</a></li> <li><a href="javascript:;" onclick=" javascript:dl([101,112,116,57,114,103,106,100,122,97,101,115,46,51,111,47,48,105,99,49,108,113,53,50,109,45,118], &quot;:F5D0IE1&gt;@G849A=69A:HC@19HF?B2;BF;&lt;;183;47&quot;); &quot;javascript: dl(&quot;" title="[1.1 MB] [Dec 04, 2019]">imagecodecs‑lite‑2019.12.3.tar.gz</a></li> ''' print('html done') #html.decode('utf-8') #print(html) '''headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1)AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11'} r = requests.get(url, headers = headers) r.encoding = "utf-8" soup = BeautifulSoup(r.text, "html.parser") #html_mod=re.sub(pattern=".",repl=".",string=html.decode('utf-8')) for link in soup.find_all('a'): #soup.find_all返回的为列表 print(link.get('href')) #name_list+=link ''' name_list = html#soup.find_all('a')#re.findall(r']">*-cp38-win_amd64.whl',html.decode('utf-8')) x=1 files=os.listdir(save_path) print(files) print(type(name_list)) name_list=str(name_list) name_list1=[] #print(name_list) #for name in name_list: k=0 # name[k]=str(name1[k]) for i in range(len(name_list)): j=0 if name_list[i-2:i+1]==']">': name_list1.append(name_list[i+1:i+60]) global m if k<len(name_list1): for l in range(len(name_list1[k])): if l-9>=0: if name_list1[k][l-4:l]=='.whl' or name_list1[k][l-3:l]=='.gz' or name_list1[k][l-4:l]=='.zip': j=1 m=l if j==1: name_list1[k]=name_list1[k][0:m] k+=1 '''if j==0: name_list.remove(name)''' #file_name = os.path.join(save_path ,name) i=0 #print(name) print(name_list1) for name in name_list1: j=0 for l in range(len(name)): if l-9>=0: if name[l-4:l]=='.whl' or name[l-3:l]=='.gz' or name[l-4:l]=='.zip': j=1 m=l if j==1: name=name[0:m] k+=1 if name in files: continue '''if name=='Delny‑0.4.1‑cp27‑none‑win_amd64.whl</a></li>\n<li>' or name==Delny‑0.4.1‑cp27‑none‑win32.whl</a></li> </ul> </: continue ''' print('no:'+str(x)) print('\ndownload '+name) # importlib.reload(sys) #imp.reload(sys) for l in range(len(name)): if l-9>=0: if name[l-4:l]=='.whl' or name[l-3:l]=='.gz' or name[l-4:l]=='.zip': j=1 m=l if j==1: name=name[0:m] k+=1 string='https://download.lfd.uci.edu/pythonlibs/s2jqpv5t/' + name#[0:4+name.find('.whl')]#https://download.lfd.uci.edu/pythonlibs/s2jqpv5t/ print('00'+save_path) count=0 v=0 for p in range(len(string)): if string[p]=='\\': if v==0: string=string[:6]+'//'+string[7:] else: string=string[:p]+'/'+string[p+1:] v+=1 if string[p-3:p]=='win': string=string[:p-4]+'-'+string[p-3:] if p<len(string): if (string[p]=='\u2011')==True: if p+1<len(string): string=string[:p]+'-'+string[p+1:] '''if string[p-2]>='0' and string[p-2]<='9' and string[p-1]>='0' and string[p-1]<='9': if (string[p]>='a'and string[p]<='z') or (string[p]>='A'and string[p]<='Z'): string=string[:p]+string[p+1:]''' if p>=len(string): break '''if name[:9]=='ad3‑2.2.1': print('aaa') continue''' conf={'url':string} d=Downloader(conf) d.start() #file(string,save_path,name) x=x+1 print('09'+name_list) print('finished') if __name__ == '__main__': main() ``` 报错: >>> ======================== RESTART: E:\2345Downloads\44.py ======================= Warning: This project has moved to logzero (see https://github.com/metachris/logzero) html done <class 'str'> ['imagecodecs_lite‑2020.1.31‑py3‑none‑any.whl', 'imagecodecs_lite‑2019.12.3‑cp38‑cp38‑win_amd64.whl', 'imagecodecs_lite‑2019.12.3‑cp38‑cp38‑win32.whl', 'imagecodecs_lite‑2019.12.3‑cp37‑cp37m‑win_amd64.whl', 'imagecodecs_lite‑2019.12.3‑cp37‑cp37m‑win32.whl', 'imagecodecs_lite‑2019.12.3‑cp36‑cp36m‑win_amd64.whl', 'imagecodecs_lite‑2019.12.3‑cp36‑cp36m‑win32.whl', 'imagecodecs_lite‑2019.12.3‑cp35‑cp35m‑win_amd64.whl', 'imagecodecs_lite‑2019.12.3‑cp35‑cp35m‑win32.whl', 'imagecodecs_lite‑2019.12.3‑cp27‑cp27m‑win_amd64.whl', 'imagecodecs_lite‑2019.12.3‑cp27‑cp27m‑win32.whl', 'imagecodecs‑lite‑2019.12.3.tar.gz'] no:1 download imagecodecs_lite‑2020.1.31‑py3‑none‑any.whl 00E:\2345Downloads Warning (from warnings module): File "C:\Users\ASUS\AppData\Local\Programs\Python\Python38\lib\site-packages\conf\reader.py", line 39 warnings.warn('cannot parse files of type "%s"' % suffix) UserWarning: cannot parse files of type ".whl" {'Range': 'bytes=0-0', 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'} Traceback (most recent call last): File "E:\2345Downloads\44.py", line 254, in start self.__file=open(path,(exists(path) and 'rb+') or 'wb' ) FileNotFoundError: [Errno 2] No such file or directory: 'E:/tmp/imagecodecs_lite-2020.1.31-py3-none-any.whl' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "E:\2345Downloads\44.py", line 616, in <module> main() File "E:\2345Downloads\44.py", line 606, in main d.start() File "E:\2345Downloads\44.py", line 259, in start except: log.out('starting failed') AttributeError: module 'log0' has no attribute 'out' >>> 求高手解决

在中国程序员是青春饭吗?

今年,我也32了 ,为了不给大家误导,咨询了猎头、圈内好友,以及年过35岁的几位老程序员……舍了老脸去揭人家伤疤……希望能给大家以帮助,记得帮我点赞哦。 目录: 你以为的人生 一次又一次的伤害 猎头界的真相 如何应对互联网行业的「中年危机」 一、你以为的人生 刚入行时,拿着傲人的工资,想着好好干,以为我们的人生是这样的: 等真到了那一天,你会发现,你的人生很可能是这样的: ...

程序员请照顾好自己,周末病魔差点一套带走我。

程序员在一个周末的时间,得了重病,差点当场去世,还好及时挽救回来了。

和黑客斗争的 6 天!

互联网公司工作,很难避免不和黑客们打交道,我呆过的两家互联网公司,几乎每月每天每分钟都有黑客在公司网站上扫描。有的是寻找 Sql 注入的缺口,有的是寻找线上服务器可能存在的漏洞,大部分都...

点沙成金:英特尔芯片制造全过程揭密

“亚马逊丛林里的蝴蝶扇动几下翅膀就可能引起两周后美国德州的一次飓风……” 这句人人皆知的话最初用来描述非线性系统中微小参数的变化所引起的系统极大变化。 而在更长的时间尺度内,我们所生活的这个世界就是这样一个异常复杂的非线性系统…… 水泥、穹顶、透视——关于时间与技艺的蝴蝶效应 公元前3000年,古埃及人将尼罗河中挖出的泥浆与纳特龙盐湖中的矿物盐混合,再掺入煅烧石灰石制成的石灰,由此得来了人...

上班一个月,后悔当初着急入职的选择了

最近有个老铁,告诉我说,上班一个月,后悔当初着急入职现在公司了。他之前在美图做手机研发,今年美图那边今年也有一波组织优化调整,他是其中一个,在协商离职后,当时捉急找工作上班,因为有房贷供着,不能没有收入来源。所以匆忙选了一家公司,实际上是一个大型外包公司,主要派遣给其他手机厂商做外包项目。**当时承诺待遇还不错,所以就立马入职去上班了。但是后面入职后,发现薪酬待遇这块并不是HR所说那样,那个HR自...

女程序员,为什么比男程序员少???

昨天看到一档综艺节目,讨论了两个话题:(1)中国学生的数学成绩,平均下来看,会比国外好?为什么?(2)男生的数学成绩,平均下来看,会比女生好?为什么?同时,我又联想到了一个技术圈经常讨...

副业收入是我做程序媛的3倍,工作外的B面人生是怎样的?

提到“程序员”,多数人脑海里首先想到的大约是:为人木讷、薪水超高、工作枯燥…… 然而,当离开工作岗位,撕去层层标签,脱下“程序员”这身外套,有的人生动又有趣,马上展现出了完全不同的A/B面人生! 不论是简单的爱好,还是正经的副业,他们都干得同样出色。偶尔,还能和程序员的特质结合,产生奇妙的“化学反应”。 @Charlotte:平日素颜示人,周末美妆博主 大家都以为程序媛也个个不修边幅,但我们也许...

如果你是老板,你会不会踢了这样的员工?

有个好朋友ZS,是技术总监,昨天问我:“有一个老下属,跟了我很多年,做事勤勤恳恳,主动性也很好。但随着公司的发展,他的进步速度,跟不上团队的步伐了,有点...

我入职阿里后,才知道原来简历这么写

私下里,有不少读者问我:“二哥,如何才能写出一份专业的技术简历呢?我总感觉自己写的简历太烂了,所以投了无数份,都石沉大海了。”说实话,我自己好多年没有写过简历了,但我认识的一个同行,他在阿里,给我说了一些他当年写简历的方法论,我感觉太牛逼了,实在是忍不住,就分享了出来,希望能够帮助到你。 01、简历的本质 作为简历的撰写者,你必须要搞清楚一点,简历的本质是什么,它就是为了来销售你的价值主张的。往深...

外包程序员的幸福生活

今天给你们讲述一个外包程序员的幸福生活。男主是Z哥,不是在外包公司上班的那种,是一名自由职业者,接外包项目自己干。接下来讲的都是真人真事。 先给大家介绍一下男主,Z哥,老程序员,是我十多年前的老同事,技术大牛,当过CTO,也创过业。因为我俩都爱好喝酒、踢球,再加上住的距离不算远,所以一直也断断续续的联系着,我对Z哥的状况也有大概了解。 Z哥几年前创业失败,后来他开始干起了外包,利用自己的技术能...

C++11:一些微小的变化(新的数据类型、template表达式内的空格、nullptr、std::nullptr_t)

本文介绍一些C++的两个新特性,它们虽然微小,但对你的编程十分重要 一、Template表达式内的空格 C++11标准之前建议在“在两个template表达式的闭符之间放一个空格”的要求已经过时了 例如: vector&lt;list&lt;int&gt; &gt;; //C++11之前 vector&lt;list&lt;int&gt;&gt;; //C++11 二、nullptr ...

优雅的替换if-else语句

场景 日常开发,if-else语句写的不少吧??当逻辑分支非常多的时候,if-else套了一层又一层,虽然业务功能倒是实现了,但是看起来是真的很不优雅,尤其是对于我这种有强迫症的程序"猿",看到这么多if-else,脑袋瓜子就嗡嗡的,总想着解锁新姿势:干掉过多的if-else!!!本文将介绍三板斧手段: 优先判断条件,条件不满足的,逻辑及时中断返回; 采用策略模式+工厂模式; 结合注解,锦...

深入剖析Springboot启动原理的底层源码,再也不怕面试官问了!

大家现在应该都对Springboot很熟悉,但是你对他的启动原理了解吗?

离职半年了,老东家又发 offer,回不回?

有小伙伴问松哥这个问题,他在上海某公司,在离职了几个月后,前公司的领导联系到他,希望他能够返聘回去,他很纠结要不要回去? 俗话说好马不吃回头草,但是这个小伙伴既然感到纠结了,我觉得至少说明了两个问题:1.曾经的公司还不错;2.现在的日子也不是很如意。否则应该就不会纠结了。 老实说,松哥之前也有过类似的经历,今天就来和小伙伴们聊聊回头草到底吃不吃。 首先一个基本观点,就是离职了也没必要和老东家弄的苦...

为什么你不想学习?只想玩?人是如何一步一步废掉的

不知道是不是只有我这样子,还是你们也有过类似的经历。 上学的时候总有很多光辉历史,学年名列前茅,或者单科目大佬,但是虽然慢慢地长大了,你开始懈怠了,开始废掉了。。。 什么?你说不知道具体的情况是怎么样的? 我来告诉你: 你常常潜意识里或者心理觉得,自己真正的生活或者奋斗还没有开始。总是幻想着自己还拥有大把时间,还有无限的可能,自己还能逆风翻盘,只不是自己还没开始罢了,自己以后肯定会变得特别厉害...

为什么程序员做外包会被瞧不起?

二哥,有个事想询问下您的意见,您觉得应届生值得去外包吗?公司虽然挺大的,中xx,但待遇感觉挺低,马上要报到,挺纠结的。

当HR压你价,说你只值7K,你该怎么回答?

当HR压你价,说你只值7K时,你可以流畅地回答,记住,是流畅,不能犹豫。 礼貌地说:“7K是吗?了解了。嗯~其实我对贵司的面试官印象很好。只不过,现在我的手头上已经有一份11K的offer。来面试,主要也是自己对贵司挺有兴趣的,所以过来看看……”(未完) 这段话主要是陪HR互诈的同时,从公司兴趣,公司职员印象上,都给予对方正面的肯定,既能提升HR的好感度,又能让谈判气氛融洽,为后面的发挥留足空间。...

面试:第十六章:Java中级开发(16k)

HashMap底层实现原理,红黑树,B+树,B树的结构原理 Spring的AOP和IOC是什么?它们常见的使用场景有哪些?Spring事务,事务的属性,传播行为,数据库隔离级别 Spring和SpringMVC,MyBatis以及SpringBoot的注解分别有哪些?SpringMVC的工作原理,SpringBoot框架的优点,MyBatis框架的优点 SpringCould组件有哪些,他们...

面试阿里p7,被按在地上摩擦,鬼知道我经历了什么?

面试阿里p7被问到的问题(当时我只知道第一个):@Conditional是做什么的?@Conditional多个条件是什么逻辑关系?条件判断在什么时候执...

面试了一个 31 岁程序员,让我有所触动,30岁以上的程序员该何去何从?

最近面试了一个31岁8年经验的程序猿,让我有点感慨,大龄程序猿该何去何从。

【阿里P6面经】二本,curd两年,疯狂复习,拿下阿里offer

二本的读者,在老东家不断学习,最后逆袭

大三实习生,字节跳动面经分享,已拿Offer

说实话,自己的算法,我一个不会,太难了吧

程序员垃圾简历长什么样?

已经连续五年参加大厂校招、社招的技术面试工作,简历看的不下于万份 这篇文章会用实例告诉你,什么是差的程序员简历! 疫情快要结束了,各个公司也都开始春招了,作为即将红遍大江南北的新晋UP主,那当然要为小伙伴们做点事(手动狗头)。 就在公众号里公开征简历,义务帮大家看,并一一点评。《启舰:春招在即,义务帮大家看看简历吧》 一石激起千层浪,三天收到两百多封简历。 花光了两个星期的所有空闲时...

《经典算法案例》01-08:如何使用质数设计扫雷(Minesweeper)游戏

我们都玩过Windows操作系统中的经典游戏扫雷(Minesweeper),如果把质数当作一颗雷,那么,表格中红色的数字哪些是雷(质数)?您能找出多少个呢?文中用列表的方式罗列了10000以内的自然数、质数(素数),6的倍数等,方便大家观察质数的分布规律及特性,以便对算法求解有指导意义。另外,判断质数是初学算法,理解算法重要性的一个非常好的案例。

《Oracle Java SE编程自学与面试指南》最佳学习路线图(2020最新版)

正确选择比瞎努力更重要!

面试官:你连SSO都不懂,就别来面试了

大厂竟然要考我SSO,卧槽。

微软为一人收购一公司?破解索尼程序、写黑客小说,看他彪悍的程序人生!...

作者 | 伍杏玲出品 | CSDN(ID:CSDNnews)格子衬衫、常掉发、双肩包、修电脑、加班多……这些似乎成了大众给程序员的固定标签。近几年流行的“跨界风”开始刷新人们对程序员的...

终于,月薪过5万了!

来看几个问题想不想月薪超过5万?想不想进入公司架构组?想不想成为项目组的负责人?想不想成为spring的高手,超越99%的对手?那么本文内容是你必须要掌握的。本文主要详解bean的生命...

我说我懂多线程,面试官立马给我发了offer

不小心拿了几个offer,有点烦

自从喜欢上了B站这12个UP主,我越来越觉得自己是个废柴了!

不怕告诉你,我自从喜欢上了这12个UP主,哔哩哔哩成为了我手机上最耗电的软件,几乎每天都会看,可是吧,看的越多,我就越觉得自己是个废柴,唉,老天不公啊,不信你看看…… 间接性踌躇满志,持续性混吃等死,都是因为你们……但是,自己的学习力在慢慢变强,这是不容忽视的,推荐给你们! 都说B站是个宝,可是有人不会挖啊,没事,今天咱挖好的送你一箩筐,首先啊,我在B站上最喜欢看这个家伙的视频了,为啥 ,咱撇...

代码注释如此沙雕,会玩还是你们程序员!

某站后端代码被“开源”,同时刷遍全网的,还有代码里的那些神注释。 我们这才知道,原来程序员个个都是段子手;这么多年来,我们也走过了他们的无数套路… 首先,产品经理,是永远永远吐槽不完的!网友的评论也非常扎心,说看这些代码就像在阅读程序员的日记,每一页都写满了对产品经理的恨。 然后,也要发出直击灵魂的质问:你是尊贵的付费大会员吗? 这不禁让人想起之前某音乐app的穷逼Vip,果然,穷逼在哪里都是...

2020春招面试了10多家大厂,我把问烂了的数据库事务知识点总结了一下

2020年截止目前,我面试了阿里巴巴、腾讯、美团、拼多多、京东、快手等互联网大厂。我发现数据库事务在面试中出现的次数非常多。

程序员如何与产品经理优雅的干架

早前,平安产险科技一名外包程序员和一名外包产品经理干架的视频几乎在互联网圈都传遍了,因为产品提了一个需求:要求用户App的主题颜色能根据手机壳自动调整。 首先说这个需求对于应用开发工程师来说,确实是有点奇葩,当然并非不能实现。这块涉及图形图像处理,用机器学习和人工智能来提取图像颜色,这是基本图像识别过程,对于采集图像,可以提示对着镜子自拍一张,上传图片,通过大量的训练数据,来识别手机体颜色。当然并...

爬虫(101)爬点重口味的

小弟最近在学校无聊的很哪,浏览网页突然看到一张图片,都快流鼻血。。。然后小弟冥思苦想,得干一点有趣的事情python 爬虫库安装https://s.taobao.com/api?_ks...

在拼多多上班,是一种什么样的体验?我心态崩了呀!

之前有很多读者咨询我:武哥,在拼多多上班是一种什么样的体验?由于一直很忙,没抽出时间来和大家分享。上周末特地花点时间来写了一篇文章,跟大家分享一下拼多多的日常。 1. 倒时差的作息 可能很多小伙伴都听说了,拼多多加班很严重。这怎么说呢?作息上确实和其他公司有点区别,大家知道 996,那么自然也就能理解拼多多的“11 11 6”了。 所以当很多小伙伴早上出门时,他们是这样的: 我们是这样的: 当...

应聘3万的职位,有必要这么刁难我么。。。沙雕。。。

又一次被面试官带到坑里面了。面试官:springmvc用过么?我:用过啊,经常用呢面试官:springmvc中为什么需要用父子容器?我:嗯。。。没听明白你说的什么。面试官:就是contr...

立即提问
相关内容推荐