Spider boy 2019-04-05 15:32 采纳率: 0%
浏览 519

python爬虫时为什么网页源码经过xpth处理后无法解析了呢

在爬取一个小说网站的时候我发现在网页的response中可以看到相关的值,但是在获取的时候就出现了问题

具体问题是这样的,

from lxml import etree
import requests

class Xiaoshuospider:
    def __init__(self):
        self.start_url = 'https://www.qiushuzw.com/t/38890/10253656.html'
        self.headers = {
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3",
            "Accept-Encoding": "gzip, deflate, br",
            "Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8",
            "Cache-Control": "max-age=0",
            "Connection": "keep-alive",
            "Cookie": "BAIDU_SSP_lcr=https://www.80txt.com/txtml_38890.html; Hm_lvt_c0ce681e8e9cc7e226131131f59a202c=1554447305; Hm_lpvt_c0ce681e8e9cc7e226131131f59a202c=1554447305; UM_distinctid=169ec4788554ea-0eba8d0589d979-1a201708-15f900-169ec4788562c1; CNZZDATA1263995655=929605835-1554443240-https%253A%252F%252Fwww.80txt.com%252F%7C1554443240",
            "Host": "www.qiushuzw.com",
            "If-Modified-Since": "Thu, 31 Jan 2019 03:00:17 GMT",
            "If-None-Match": 'W/"5c5264c1 - 3f30"',
            "Referer": "https://www.80txt.com/txtml_38890.html",
            "Upgrade-Insecure-Requests": "1",
            "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36",
        }

    def parse(self):
        res = requests.get(self.start_url,headers=self.headers).content.decode()
        html = etree.HTML(res)
        content = html.xpath("div[@class='book_content']/text()")
        print(content)

    def run(self):
        self.parse()

if __name__ == '__main__':
    xiaoshuo = Xiaoshuospider()
    xiaoshuo.run()
  • 根据xpath规则我将这些信息处理以后无法找到相应小说文本内容,小说的详细信息无法使用xpath提取出来

有没有哪位大佬也遇到相应的问题

  • 写回答

1条回答

  • 堅持就是勝利! 2023-11-25 10:14
    关注

    看看xpath

    评论

报告相同问题?

悬赏问题

  • ¥15 #MATLAB仿真#车辆换道路径规划
  • ¥15 java 操作 elasticsearch 8.1 实现 索引的重建
  • ¥15 数据可视化Python
  • ¥15 要给毕业设计添加扫码登录的功能!!有偿
  • ¥15 kafka 分区副本增加会导致消息丢失或者不可用吗?
  • ¥15 微信公众号自制会员卡没有收款渠道啊
  • ¥100 Jenkins自动化部署—悬赏100元
  • ¥15 关于#python#的问题:求帮写python代码
  • ¥20 MATLAB画图图形出现上下震荡的线条
  • ¥15 关于#windows#的问题:怎么用WIN 11系统的电脑 克隆WIN NT3.51-4.0系统的硬盘