uesr_agent = 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0'
headers = {'User-agent':uesr_agent}
req = urllib.request.Request(url, headers = headers)
html_1 = urllib.request.urlopen(req, timeout=120).read()
#html = str(response.read(),'utf-8')
encoding_dict = chardet.detect(html_1)
#print encoding
web_coding = encoding_dict['encoding']
print (web_coding)
if web_coding == 'utf-8' or web_coding =='UTF-8':
html = html_1
else:
html = html_1.decode('gbk','ignore').encode('utf-8')
print (html)
网站地址:
http://nc.mofcom.gov.cn/channel/gxdj/jghq/jg_list.shtml?par_craft_index=13075&craft_index=20413&startTime=2014-01-01&endTime=2014-03-31&par_p_index=&p_index=&keyword=&page=1
显示信息:![图片说明](https://img-ask.csdn.net/upload/201703/22/1490160982_691178.png)
用的python3,把网上的方法都试了一遍,还是不行,不知道怎么办了,求助
python爬虫网站信息乱码问题
- 写回答
- 好问题 0 提建议
- 追加酬金
- 关注问题
- 邀请回答
-
3条回答
- oyljerry 2017-03-22 06:04关注
对应内容UTF8编码处理一下
In [5]: s = "\xe4\xbb\xb7\xe6\xa0\xbc\xe8\xa1\x8c\xe6\x83\x85".decode('utf8') In [6]: s Out[6]: u'\u4ef7\u683c\u884c\u60c5' In [7]: print(s) 价格行情
解决 无用评论 打赏 举报