问题遇到的现象和发生背景
第一次尝试用脚本下载图片
发现获得的text中会有莫名的""符号出现并且影响了后面xpath的识别
import requests
from lxml import etree
index_url = 'https://baike.sogou.com/v64864633.htm'
header = {
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) '
'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36'
}
response = requests.get(index_url, headers=header)
print(response)
response.encodeing = 'utf-8'
print(response.text)
#
selector = etree.HTML(response.text)
#
image_urls = selector.xpath('//a[@class="ed_image_link"]/@title')
#
offset = 0
for image_url in image_urls:
print(image_url)
遇到的现象和发生背景,请写出第一个错误信息
