请求python3.7中的url中文问题

import string
import urllib
import json
import time
from quopri import quote

ISOTIMEFORMAT='%Y-%m-%d %X'

outputFile = 'douban_movie.txt'
fw = open(outputFile, 'w')
fw.write('id;title;url;cover;rate\n')

headers = {}
headers["Accept"] = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"
headers["Accept-Encoding"] = "gzip, deflate, sdch"
headers["Accept-Language"] = "zh-CN,zh;q=0.8,en;q=0.6,zh-TW;q=0.4,ja;q=0.2"

headers["Cache-Control"] = "max-age=0"

headers["Connection"] = "keep-alive"

headers["Cookie"] = 'bid="LJSWKkSUfZE"; ll="108296"; __utmt=1; regpop=1; _pk_id.100001.4cf6=32aff4d8271b3f15.1442223906.2.1442237186.1442224653.; _pk_ses.100001.4cf6=*; __utmt_douban=1; utma=223695111.736177897.1442223906.1442223906.1442236473.2; utmb=223695111.0.10.1442236473; utmc=223695111; utmz=223695111.1442223906.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); utma=30149280.674845100.1442223906.1442236473.1442236830.3; utmb=30149280.4.9.1442237186215; utmc=30149280; utmz=30149280.1442236830.3.2.utmcsr=baidu|utmccn=(organic)|utmcmd=organic; ap=1'

headers["Host"] = "movie.douban.com"
headers["Referer"] = "http://movie.douban.com/"
headers["Upgrade-Insecure-Requests"] = 1
headers["User-Agent"] = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36"

获取tag

request = urllib.request.Request(url="http://movie.douban.com/j/search_tags?type=movie")
response = urllib.request.urlopen(request)
tags = json.loads(response.read())['tags']

开始爬取

print ("********** START **********")
print (time.strftime( ISOTIMEFORMAT, time.localtime() ))

for tag in tags:
print ("Crawl movies with tag: " + tag)
print (time.strftime( ISOTIMEFORMAT, time.localtime() ))

start = 0
while True:
    url = "http://movie.douban.com/j/search_subjects?type=movie&tag=" +tag.encode("utf-8")+"&page_limit=20&page_start="+str(start)
    #url = quote(url, safe=string.printable)
    request = urllib.request.Request(url=url)
    response = urllib.request.urlopen(request)
    movies = json.loads(response.read())['subjects']
    if len(movies) == 0:
        break
    for item in movies:
        rate = item['rate']
        title = item['title']
        url = item['url']
        cover = item['cover']
        movieId = item['id']
        record = str(movieId) + ';' + title + ';' + url + ';' + cover + ';' + str(rate) + '\n'
        fw.write(record.encode('utf-8'))
        print (tag + '\t' + title)
    start = start + 20

fw.close()

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
7*24 工作者 2019-06-03 14:30
关注
你需要导入 urllib.request 库，不是 urllib

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

有关python3.7中的输入问题 python
2018-08-25 03:11

回答 2 已采纳你可以用split啊 str = input() arr = str.split(' ') #假设你的输入是空格分隔的要转换数字的话，可以用 int 或者 float 等。
Python3.7怎么安装pandas库啊 python
2021-06-23 00:15

回答 1 已采纳 windows的话命令行输入pip install pandas，前提是把python目录添加到环境变量，安装python的时候有选项可以勾选。linux应该直接pip就可以。
python3.7使用playwright出现问题 python
2023-04-21 09:38

回答 1 已采纳重新安装Playwright：尝试卸载并重新安装最新版本的Playwright。确保你的系统已安装所需的依赖项：Chromium需要一些依赖项才能在系统上运行。可以查看Playwright官方文档以了
python3 asyncio_python3.7中asyncio的具体实现
2020-11-27 18:56

weixin_39948210的博客开辟新的线程和进程是非常耗时的讲讲我在使用python异步IO语法时踩过的坑简单介绍异步IO的原理以及利用最新语法糖实现异步IO的步骤,然后给出实现异步的不同例子网上找了很多python的asyncio示例.很多都是用#获取...
用Python3.9不兼容Python3.8或者3.7吗？ python
2021-10-18 12:25

回答 2 已采纳最新版的Python3.9程序在 win7 上运行是有些问题。不过图中的这个错误明显是没有找到驱动呀，找找你的 IE 驱动到底在哪。另外删除线是提示某个函数即将被弃用，这个与Python版本其实没什
python3.7安装jupyter jupyter python 开发语言
2022-07-11 15:23

回答 2 已采纳你看看有没有jupyter的依赖包，没有的话，先安装依赖在安装jupyterhttps://www.csdn.net/tags/MtTaMg0sMTI0NDAzLWJsb2cO0O0O.html
Python3.7中min()和max()实际上是如何推算的？ python
2020-11-17 16:44

回答 2 已采纳 tup4是的值是字符串，是按照ASCII码来排序的， - 号在 . 号前面，在数字前面
python3.7从零开始学下载_Python 3.7从零开始学
2021-01-29 00:55

朱子宁的博客第1章进入Python3.7的精彩世界11.1Python的起源11.2Python的应用场合21.3从2.7到3.7，Python的新特性41.4如何学习Python61.5Python环境构建71.5.1在Windows系统中安装Python71.5.2在Linux、UNIX系统和Mac中安装...
python3.7 互斥锁问题 python
2019-03-21 17:09

回答 1 已采纳 tepe1 = threading.Lock这里应该出问题了,貌似LOCK是函数def allocate_lock()的引用,也就是说你需要用tepe1 = threading.Lock(),去创建t
树莓派安装PYTHON3.7后运行PIP报错。 python
2022-03-26 21:13

回答 2 已采纳应该就是原来2.7 还存在，对原来的2.7 的python 软链接改一下名，建议对pip的软链接也改下名 -bash: /usr/bin/yum: /usr/bi
python3.7 安装requests报错，求大神支招？ python
2019-06-10 11:25

回答 1 已采纳是公司网络限制了，下源码手动安装好了
python3.7游戏_Python 3.7从零开始学
2020-11-24 03:50

weixin_39983350的博客 1.jpg (79.79 KB, 下载次数: 4)2020-2-15 15:42 上传内容简介本书专门针对Python新手量身编写，涵盖Python 3实际开发的重要知识点，内容包括：Python语言的类型和对象、操作符和表达式、编程结构和控制流、函数、...
python3.7程序打包时提示matplotlib错误 python
2021-07-20 17:59

回答 2 已采纳换一下版本试试：https://stackoverflow.com/questions/67345287/matplotlib-directory-not-found-while-using-pyin
python3.7用法_在Python3.7中使用请求库进行异步请求
2020-11-26 07:43

weixin_39976382的博客与lib一起请求的Lukasa说：At the current time there are no plans to support async and await. This is not because they aren't a good idea: they are. It's because to use them requires quite substantial ...
获取静态html网页 get请求和post请求 python3.7
2018-12-24 14:56

water bucket的博客通过fiddler可以看到post请求需要start和limit,将其写到json中进行encode import urllib . request import urllib . parse url = '...
没有解决我的问题, 去提问

悬赏问题

¥15 微信公众号自制会员卡没有收款渠道啊
¥15 stable diffusion
¥100 Jenkins自动化部署—悬赏100元
¥15 关于#python#的问题：求帮写python代码
¥20 MATLAB画图图形出现上下震荡的线条
¥15 关于#windows#的问题：怎么用WIN 11系统的电脑克隆WIN NT3.51-4.0系统的硬盘
¥15 perl MISA分析p3_in脚本出错
¥15 k8s部署jupyterlab，jupyterlab保存不了文件
¥15 ubuntu虚拟机打包apk错误
¥199 rust编程架构设计的方案有偿

请求python3.7中 的url中文问题