问题遇到的现象和发生背景
在第26行,xpath表达式不正确
问题相关代码,请勿粘贴截图
from lxml import etree
import requests
if __name__ == '__main__':
url = 'https://m.58.com/bj/ershoufang/?reform=pcfront'
# UA伪装
head = {
'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Mobile Safari/537.36'
}
# universal crawler
page_text = requests.get(url=url, headers=head).text
# xpath
parser = etree.HTMLParser(encoding='utf-8')
tree = etree.HTML(page_text, parser=parser)
print(tree)
li_list = tree.xpath('//ul[@class="list"]/li[@class="item-wrap"]')
print(li_list)
with open(r'../gotpages/58secondhand_houses.txt', 'w', encoding='utf-8') as stream:
for li in li_list:
house_name = li.xpath('./span[@class="content-title"]/text()]')
#print(house_name)
stream.write(house_name)
print(house_name)
运行结果及报错内容
F:\pythonfiles\PycharmProjects\CRAWLER\venv\Scripts\python.exe "F:/pythonfiles/PycharmProjects/CRAWLER/focused crawler-Data analysis/crawler_58com realization in xpath.py"
Traceback (most recent call last):
File "F:\pythonfiles\PycharmProjects\CRAWLER\focused crawler-Data analysis\crawler_58com realization in xpath.py", line 26, in <module>
house_name = li.xpath('./span[@class="content-title"]/text()]')
File "src\lxml\etree.pyx", line 1597, in lxml.etree._Element.xpath
File "src\lxml\xpath.pxi", line 305, in lxml.etree.XPathElementEvaluator.__call__
File "src\lxml\xpath.pxi", line 225, in lxml.etree._XPathEvaluatorBase._handle_result
lxml.etree.XPathEvalError: Invalid expression
Process finished with exit code 1