在爬取一个小说网站的时候我发现在网页的response中可以看到相关的值,但是在获取的时候就出现了问题
具体问题是这样的,
from lxml import etree
import requests
class Xiaoshuospider:
def __init__(self):
self.start_url = 'https://www.qiushuzw.com/t/38890/10253656.html'
self.headers = {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8",
"Cache-Control": "max-age=0",
"Connection": "keep-alive",
"Cookie": "BAIDU_SSP_lcr=https://www.80txt.com/txtml_38890.html; Hm_lvt_c0ce681e8e9cc7e226131131f59a202c=1554447305; Hm_lpvt_c0ce681e8e9cc7e226131131f59a202c=1554447305; UM_distinctid=169ec4788554ea-0eba8d0589d979-1a201708-15f900-169ec4788562c1; CNZZDATA1263995655=929605835-1554443240-https%253A%252F%252Fwww.80txt.com%252F%7C1554443240",
"Host": "www.qiushuzw.com",
"If-Modified-Since": "Thu, 31 Jan 2019 03:00:17 GMT",
"If-None-Match": 'W/"5c5264c1 - 3f30"',
"Referer": "https://www.80txt.com/txtml_38890.html",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36",
}
def parse(self):
res = requests.get(self.start_url,headers=self.headers).content.decode()
html = etree.HTML(res)
content = html.xpath("div[@class='book_content']/text()")
print(content)
def run(self):
self.parse()
if __name__ == '__main__':
xiaoshuo = Xiaoshuospider()
xiaoshuo.run()
- 根据xpath规则我将这些信息处理以后无法找到相应小说文本内容,小说的详细信息无法使用xpath提取出来
有没有哪位大佬也遇到相应的问题