为什么get_url()函数运行到response = requests.get(url, headers=headers)，会被提示“无法加载源“<string>”: Source unavailable”？

import requests, time, random
from fake_useragent import UserAgent
from lxml import etree

def get_url(url):
    headers = {
        "User-Agent": UserAgent().random
    }
    time.sleep(random.randint(3, 9))
    response = requests.get(url, headers=headers)
    response.encoding='utf8'
    if response.status_code == 200:
        return response.text
    else:
        return None

def parse_index(html):
    e = etree.HTML(html)
    movie_base_url = "https://maoyan.com{}?catId=3&showType=3"
    all_url = e.xpath('//div[@class="movie-item film-channel"]/a/@href')
    return [movie_base_url.format(url) for url in all_url]

def parse_info(html):
    e = etree.HTML(html)
    name = e.xpath('//h1[@class="name"]')
    type_ = e.xpath('//li[@class="ellipsis"][1]')
    contary_duration = e.xpath('//li[@class="ellipsis"][2]')
    year = e.xpath('//li[@class="ellipsis"][3]')
    introduce = e.xpath('//span[@class="dra"]')
    return {
        "name": name,
        "type": type_,
        "contary_duration": contary_duration,
        "year": year,
        "introduce": introduce
    }

def main():
    """
    base_url = "https://maoyan.com/films?catId=3&showType=3&offset={}"
    for i in range(0, 3):
        new_url = base_url.format(i*30)
        # time.sleep(random.randint(2,4))
        html = get_url(new_url)
        movie_urls = parse_index(html)
        for movie_url in movie_urls:
            movie_html = get_url(movie_url)
            movie_info = parse_info(movie_html)
            with open('movie.txt', 'a', encoding='utf8') as f:
                f.write(movie_info, encoding='utf8')
    """
    base_url = "https://maoyan.com/films?catId=3&showType=3&offset=0"
    html = get_url(base_url)
    movie_urls = parse_index(html)
    for movie_url in movie_urls:
        movie_html = get_url(movie_url)
        movie_info = parse_info(movie_html)
        with open('movie.txt', 'a', encoding='utf8') as f:
            f.write(movie_info, encoding='utf8')
if __name__ == "__main__":
    main()

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
binbincoder 2020-04-27 20:56
关注
proxy = {'http': 'http://' + proxy} # proxy ip:port web_content = requests.get(urls,headers=headers, proxies=proxy, timeout=10)
解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

python 爬虫 requests.get() 所有网页都打不开 python 爬虫
2022-09-24 23:49

回答 1 已采纳 http协议都没加
requests.exceptions.InvalidURL: Failed to parse: <Response [200]> python 有问必答爬虫
2021-08-26 09:47

回答 2 已采纳 url_get = requests.get(" http://music.163.com/song/media/outer/url?id%22
python里requests.get到的数据.json()出错是怎么回事，求解，急！~ python 有问必答爬虫
2021-10-05 00:13

回答 3 已采纳返回的值不符合json规范，先返回文本，打印内容看看，再做处理。
response = requests.get( url=url, headers=headers, ) return ...
2023-01-23 13:13

无声远望的博客这段代码使用了Python的requests库，通过GET方法向给定的url发送请求，并使用headers参数设置请求头。最后返回响应的文本内容。
Python爬虫requests.get方法无法显示div中折叠内容 https python 有问必答爬虫
2021-11-27 19:16

回答 2 已采纳该页面数据是动态加载的，需要用此链接用post请求去获取https://www.xuetangx.com/api/v1/lms/get_product_list/?page=1
已经安装了requests 为什么在运行时还会有You need either charset_normalizer or chardet installed python 有问必答
2021-08-23 00:04

回答 9 已采纳卸载重装
Python爬虫、requests库的get方法，带上headers后出现编码错误 python 爬虫
2018-03-25 07:07

回答 4 已采纳头有问题啊！'Mozilla/5.0 (Windows NT 6.1; W…) Gecko/20100101 Firefox/59.0'不知道怎么会出现...这种符号！自己去复制浏览器的user-ag
python3 爬虫相关学习3：response= requests.get(url)的各种属性
2023-05-15 21:04

奔跑的犀牛先生的博客 print(response.content.decode()) # 注意这里！utf-8编码，utf-16编码，utf-32编码。中文GBK，英文ASCII ，繁体中文big5。共收录了21003个汉字，883个字符。编码范围是0x8140~0xFEFE。utf-8编码带BOM 和无BOM的。...
为什么url拼接不上url 可以变成 python
2022-11-03 17:23

回答 3 已采纳改成这样就行了： import requests import re headers = { 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW
module 'wsgiref.headers' has no attribute 'items' python
2022-07-01 16:57

回答 2 已采纳 clc_info_html=requests.get('http://www.530p.com/'+clc_url,headers=headers)是不是少了代码？这句代码里的headers应该是你指
python报错：requests.exceptions.ConnectionError: ('Connection aborted.', OSError("(10060, 'WSAETIMEDOUT')")) python 开发语言
2020-03-02 19:59

回答 2 已采纳 10060 WSAETIMEDOUT 是请求超时了，请确认 res = requests.get(url) 这一句请求的地址是否拼接正确，如果是正确的，那就设置超时时间大一点吧。
response = requests.get().json() 的使用、urlretrieve(url=url, filename="test.jpg")的使用
2018-12-19 19:17

Rouckie的博客 import requests from urllib.request import urlretrieve heros_url = "... # rep如果是json，就可以像下面这样用 req = requests.get(url=heros_url, headers=headers).json() # .json()的使用 prin...
Python报错：AttributeError: 'HomeSpider' object has no attribute 'get_page_all', 请教各位? python
2021-09-02 17:51

回答 2 已采纳后面那几个成员函数缩进不对,应该在class内部而不是和class同级
python requests.get报错_【零基础直接干python】requests.get(url)返回的status_code=418的错误...
2021-02-09 06:00

weixin_39827625的博客看了点儿简单的python基础，直接干实例，毕竟我自学能力差，一点点儿看基础，看了还是什么也不会，永远只会print()……于是直接干实例，不懂的问题再搜索。仿着例子做爬虫。是爬去热榜书名，很简单的爬虫。就像一些...
Requests模块
2022-08-15 11:15

__Samual的博客该模块主要用于发送请求获取响应，该模块有很多的替代模块，比如说urllib模块，但是在工作中用的最多的还是requests模块，requests的代码简洁易懂，相对于臃肿的urllib模块，使用requests编写的爬虫代码将会更少，...
没有解决我的问题, 去提问

悬赏问题

¥30 这是哪个作者做的宝宝起名网站
¥60 版本过低apk如何修改可以兼容新的安卓系统
¥25 由IPR导致的DRIVER_POWER_STATE_FAILURE蓝屏
¥50 有数据，怎么建立模型求影响全要素生产率的因素
¥50 有数据，怎么用matlab求全要素生产率
¥15 TI的insta-spin例程
¥15 完成下列问题完成下列问题
¥15 C#算法问题, 不知道怎么处理这个数据的转换
¥15 YoloV5 第三方库的版本对照问题
¥15 请完成下列相关问题！

为什么get_url()函数运行到response = requests.get(url, headers=headers)，会被提示“无法加载源“<string>”: Source unavailable”？

1条回答 默认 最新

悬赏问题

1条回答默认最新