一位不愿透露姓名的安东尼先生 2020-04-26 22:20 采纳率: 0%
浏览 467
已结题

为什么会提示 无法加载源"<string>": sourse unavailable

get_url()函数运行到response=requests.get()处会提示:无法加载源“”: Source unavailable。整个运行下来不会有任何结果。

import requests, time, random
from fake_useragent import UserAgent
from lxml import etree

def get_url(url):
headers = {
"User-Agent": UserAgent().random
}
time.sleep(random.randint(3, 9))
response = requests.get(url, headers=headers)
response.encoding='utf8'
if response.status_code == 200:
return response.text
else:
return None

def parse_index(html):
e = etree.HTML(html)
movie_base_url = "https://maoyan.com{}?catId=3&showType=3"
all_url = e.xpath('//div[@class="movie-item film-channel"]/a/@href')
return [movie_base_url.format(url) for url in all_url]

def parse_info(html):
e = etree.HTML(html)
name = e.xpath('//h1[@class="name"]')
type_ = e.xpath('//li[@class="ellipsis"][1]')
contary_duration = e.xpath('//li[@class="ellipsis"][2]')
year = e.xpath('//li[@class="ellipsis"][3]')
introduce = e.xpath('//span[@class="dra"]')
return {
"name": name,
"type": type_,
"contary_duration": contary_duration,
"year": year,
"introduce": introduce
}

def main():
base_url = "https://maoyan.com/films?catId=3&showType=3&offset={}"
for i in range(0, 3):
new_url = base_url.format(i*30)
# time.sleep(random.randint(2,4))
html = get_url(new_url)
movie_urls = parse_index(html)
for movie_url in movie_urls:
movie_html = get_url(movie_url)
movie_info = parse_info(movie_html)
with open('movie.txt', 'a', encoding='utf8') as f:
f.write(movie_info, encoding='utf8')

if name == "__main__":
main()

  • 写回答

0条回答 默认 最新

    报告相同问题?

    悬赏问题

    • ¥15 ubuntu虚拟机打包apk错误
    • ¥199 rust编程架构设计的方案 有偿
    • ¥15 回答4f系统的像差计算
    • ¥15 java如何提取出pdf里的文字?
    • ¥100 求三轴之间相互配合画圆以及直线的算法
    • ¥100 c语言,请帮蒟蒻写一个题的范例作参考
    • ¥15 名为“Product”的列已属于此 DataTable
    • ¥15 安卓adb backup备份应用数据失败
    • ¥15 eclipse运行项目时遇到的问题
    • ¥15 关于#c##的问题:最近需要用CAT工具Trados进行一些开发