请求python3.7中的url中文问题

import string
import urllib
import json
import time
from quopri import quote

ISOTIMEFORMAT='%Y-%m-%d %X'

outputFile = 'douban_movie.txt'
fw = open(outputFile, 'w')
fw.write('id;title;url;cover;rate\n')

headers = {}
headers["Accept"] = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"
headers["Accept-Encoding"] = "gzip, deflate, sdch"
headers["Accept-Language"] = "zh-CN,zh;q=0.8,en;q=0.6,zh-TW;q=0.4,ja;q=0.2"

headers["Cache-Control"] = "max-age=0"

headers["Connection"] = "keep-alive"

headers["Cookie"] = 'bid="LJSWKkSUfZE"; ll="108296"; __utmt=1; regpop=1; _pk_id.100001.4cf6=32aff4d8271b3f15.1442223906.2.1442237186.1442224653.; _pk_ses.100001.4cf6=*; __utmt_douban=1; utma=223695111.736177897.1442223906.1442223906.1442236473.2; utmb=223695111.0.10.1442236473; utmc=223695111; utmz=223695111.1442223906.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); utma=30149280.674845100.1442223906.1442236473.1442236830.3; utmb=30149280.4.9.1442237186215; utmc=30149280; utmz=30149280.1442236830.3.2.utmcsr=baidu|utmccn=(organic)|utmcmd=organic; ap=1'

headers["Host"] = "movie.douban.com"
headers["Referer"] = "http://movie.douban.com/"
headers["Upgrade-Insecure-Requests"] = 1
headers["User-Agent"] = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36"

获取tag

request = urllib.request.Request(url="http://movie.douban.com/j/search_tags?type=movie")
response = urllib.request.urlopen(request)
tags = json.loads(response.read())['tags']

开始爬取

print ("********** START **********")
print (time.strftime( ISOTIMEFORMAT, time.localtime() ))

for tag in tags:
print ("Crawl movies with tag: " + tag)
print (time.strftime( ISOTIMEFORMAT, time.localtime() ))

start = 0
while True:
    url = "http://movie.douban.com/j/search_subjects?type=movie&tag=" +tag.encode("utf-8")+"&page_limit=20&page_start="+str(start)
    #url = quote(url, safe=string.printable)
    request = urllib.request.Request(url=url)
    response = urllib.request.urlopen(request)
    movies = json.loads(response.read())['subjects']
    if len(movies) == 0:
        break
    for item in movies:
        rate = item['rate']
        title = item['title']
        url = item['url']
        cover = item['cover']
        movieId = item['id']
        record = str(movieId) + ';' + title + ';' + url + ';' + cover + ';' + str(rate) + '\n'
        fw.write(record.encode('utf-8'))
        print (tag + '\t' + title)
    start = start + 20

fw.close()

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
7*24 工作者 2019-06-03 14:30
关注
你需要导入 urllib.request 库，不是 urllib

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

python3.7简单的爬虫实例详解
2020-09-19 03:26

在本篇文章中，我们将详细介绍如何使用Python3.7编写一个简单的网络爬虫程序。爬虫是一种自动化工具，用于从互联网上抓取数据。随着互联网信息量的爆炸性增长，爬虫技术变得越来越重要。通过本教程的学习，您将能够...
500lines之crawler爬虫（python3.7改进版）
2019-01-31 19:28

总结，"500lines之crawler爬虫（python3.7改进版）"项目涵盖了Python 3.7中的网络请求、数据解析、数据处理和报告生成等多个关键环节，是学习和实践Python爬虫技术的一个实用案例。开发者需要掌握如`requests`、`...
python3 asyncio_python3.7中asyncio的具体实现
2020-11-27 18:56

weixin_39948210的博客开辟新的线程和进程是非常耗时的讲讲我在使用python异步IO语法时踩过的坑简单介绍异步IO的原理以及利用最新语法糖实现异步IO的步骤,然后给出实现异步的不同例子网上找了很多python的asyncio示例.很多都是用#获取...
基于 python 3.7 + django 2.2.3 实现的资产管理系统.zip
2023-06-01 21:00

【Python 3.7与Django 2.2.3简介】 Python 3.7是Python编程语言的一个重要版本，其主要改进包括增强型的类型注解、非局部（nonlocal）关键字的改进以及数据类（data classes）的引入。这些特性使得Python 3.7在编写...
Python基于Django毕业源码案例设计+ Pycharm + Python3.7 + Django.zip
2023-12-18 22:23

在本篇内容中，我们将深入探讨一个使用Python 3.7和Django进行的毕业设计案例，通过分析“Python_django_selected_topic-master”这个压缩包中的项目源码，来了解如何运用这两个技术栈来构建一个完整的前后台系统。...
python3.7 scrapy简单爬虫入门
2018-11-20 17:15

总结，Python 3.7 中使用 Scrapy 框架开发爬虫的主要步骤包括：安装 Scrapy，创建项目，定义爬虫类，编写解析函数，配置数据结构，设置项目配置，最后运行爬虫。这个过程展示了如何针对 `http://www.okhqb.com/` ...
获取静态html网页 get请求和post请求 python3.7
2018-12-24 14:56

water bucket的博客通过fiddler可以看到post请求需要start和limit,将其写到json中进行encode import urllib . request import urllib . parse url = '...
python3.7 使用urllib模拟发送请求
2023-06-25 19:32

达摩院扫地僧的博客【代码】python3.7 使用urllib模拟发送请求。
python3.7游戏_Python 3.7从零开始学
2020-11-24 03:50

weixin_39983350的博客 1.jpg (79.79 KB, 下载次数: 4)2020-2-15 15:42 上传内容简介本书专门针对Python新手量身编写，涵盖Python 3实际开发的重要知识点，内容包括：Python语言的类型和对象、操作符和表达式、编程结构和控制流、函数、...
python3.7用法_在Python3.7中使用请求库进行异步请求
2020-11-26 07:43

weixin_39976382的博客与lib一起请求的Lukasa说：At the current time there are no plans to support async and await. This is not because they aren't a good idea: they are. It's because to use them requires quite substantial ...
没有解决我的问题, 去提问

请求python3.7中 的url中文问题

headers["Cache-Control"] = "max-age=0"

获取tag

开始爬取

1条回答 默认 最新

请求python3.7中的url中文问题

1条回答默认最新