刚学爬虫没多久,在学习的道路上遇到点问题:用xpath爬取时,由于是间接爬取,使得第一步爬取的网址的属性为列表(part_link),转换为字符串(part_link_s)后,准备继续爬取具体内容,但总是提示说参数(link)不对,求解答
res=requests.get(url,headers=headers)
selector=etree.HTML(res.text)
url_infos=selector.xpath('//div[@class="book-list"]/ul/li')
for url_info in url_infos:
part_link=url_info.xpath('div[2]/a/@href')
part_link_s=str(part_link)
link='http:'+part_link_s
res1=requests.get(link,headers=headers)
selector=etree.HTML(res1.text)
infos=selector.xpath('//div[@class="book-intro"]/p')
for info in infos:
f_link=info.xpath('div[1]/p')
print(len(f_link))
运行后显示:
raise InvalidURL("Invalid URL %r: No host supplied" % url)
requests.exceptions.InvalidURL: Invalid URL 'http:[]': No host supplied
求告知如何解决