一位不愿透露姓名的安东尼先生 2020-04-26 22:28 采纳率: 0%
浏览 2223
已结题

为什么get_url()函数运行到response = requests.get(url, headers=headers),会被提示“无法加载源“<string>”: Source unavailable”?

import requests, time, random
from fake_useragent import UserAgent
from lxml import etree

def get_url(url):
    headers = {
        "User-Agent": UserAgent().random
    }
    time.sleep(random.randint(3, 9))
    response = requests.get(url, headers=headers)
    response.encoding='utf8'
    if response.status_code == 200:
        return response.text
    else:
        return None

def parse_index(html):
    e = etree.HTML(html)
    movie_base_url = "https://maoyan.com{}?catId=3&showType=3"
    all_url = e.xpath('//div[@class="movie-item film-channel"]/a/@href')
    return [movie_base_url.format(url) for url in all_url]

def parse_info(html):
    e = etree.HTML(html)
    name = e.xpath('//h1[@class="name"]')
    type_ = e.xpath('//li[@class="ellipsis"][1]')
    contary_duration = e.xpath('//li[@class="ellipsis"][2]')
    year = e.xpath('//li[@class="ellipsis"][3]')
    introduce = e.xpath('//span[@class="dra"]')
    return {
        "name": name,
        "type": type_,
        "contary_duration": contary_duration,
        "year": year,
        "introduce": introduce
    }

def main():
    """
    base_url = "https://maoyan.com/films?catId=3&showType=3&offset={}"
    for i in range(0, 3):
        new_url = base_url.format(i*30)
        # time.sleep(random.randint(2,4))
        html = get_url(new_url)
        movie_urls = parse_index(html)
        for movie_url in movie_urls:
            movie_html = get_url(movie_url)
            movie_info = parse_info(movie_html)
            with open('movie.txt', 'a', encoding='utf8') as f:
                f.write(movie_info, encoding='utf8')
    """
    base_url = "https://maoyan.com/films?catId=3&showType=3&offset=0"
    html = get_url(base_url)
    movie_urls = parse_index(html)
    for movie_url in movie_urls:
        movie_html = get_url(movie_url)
        movie_info = parse_info(movie_html)
        with open('movie.txt', 'a', encoding='utf8') as f:
            f.write(movie_info, encoding='utf8')
if __name__ == "__main__":
    main()
  • 写回答

1条回答 默认 最新

  • binbincoder 2020-04-27 20:56
    关注
    proxy = {'http': 'http://' + proxy} # proxy ip:port
    web_content = requests.get(urls,headers=headers, proxies=proxy, timeout=10)
    
    评论

报告相同问题?

悬赏问题

  • ¥30 这是哪个作者做的宝宝起名网站
  • ¥60 版本过低apk如何修改可以兼容新的安卓系统
  • ¥25 由IPR导致的DRIVER_POWER_STATE_FAILURE蓝屏
  • ¥50 有数据,怎么建立模型求影响全要素生产率的因素
  • ¥50 有数据,怎么用matlab求全要素生产率
  • ¥15 TI的insta-spin例程
  • ¥15 完成下列问题完成下列问题
  • ¥15 C#算法问题, 不知道怎么处理这个数据的转换
  • ¥15 YoloV5 第三方库的版本对照问题
  • ¥15 请完成下列相关问题!