moqiluo
moqiluo
2017-03-22 05:38

python爬虫网站信息乱码问题

  • 爬虫
  • python
  • 网站
  • 乱码
    uesr_agent = 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0'      
headers = {'User-agent':uesr_agent}
req = urllib.request.Request(url, headers = headers)
html_1 = urllib.request.urlopen(req, timeout=120).read()
#html = str(response.read(),'utf-8')

encoding_dict = chardet.detect(html_1)
#print encoding
web_coding = encoding_dict['encoding']
print (web_coding)
if web_coding == 'utf-8' or web_coding =='UTF-8':
    html = html_1
else:
    html = html_1.decode('gbk','ignore').encode('utf-8')
print (html)

    网站地址:
    http://nc.mofcom.gov.cn/channel/gxdj/jghq/jg_list.shtml?par_craft_index=13075&craft_index=20413&startTime=2014-01-01&endTime=2014-03-31&par_p_index=&p_index=&keyword=&page=1

    显示信息:![图片说明](https://img-ask.csdn.net/upload/201703/22/1490160982_691178.png)

    用的python3,把网上的方法都试了一遍,还是不行,不知道怎么办了,求助
  • 点赞
  • 回答
  • 收藏
  • 复制链接分享

3条回答

为你推荐

换一换