空条三锅 2021-02-08 11:41 采纳率: 0%
浏览 33
已结题

简单爬虫求助大大大佬

我用xpath匹配不出来数字

from lxml import etree
from bs4 import BeautifulSoup
import requests
import time
url='http://glidedsky.com/level/web/crawler-basic-1'
def main(url1):
    headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36 Edg/88.0.705.56','Cookie': 'acw_tc=2760820316125972250506308e6ec5bbbfb0ee4f05cd6654aa87c9ff316107; xq_a_token=176b14b3953a7c8a2ae4e4fae4c848decc03a883; xqat=176b14b3953a7c8a2ae4e4fae4c848decc03a883; xq_r_token=2c9b0faa98159f39fa3f96606a9498edb9ddac60; xq_id_token=eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiJ9.eyJ1aWQiOi0xLCJpc3MiOiJ1YyIsImV4cCI6MTYxMzQ0MzE3MSwiY3RtIjoxNjEyNTk3MjIxMzA2LCJjaWQiOiJkOWQwbjRBWnVwIn0.lN8zzdiJLPwJHb0PXSKWs-HcgwiNYPFt4O2IsiSHEhPC_6GpsWYaql2MGgtaI0M0v--HY2KHIIunB5q6ZygzWACEzk8gpYqWcIG3zEzJBAeP89GpTxFxD8oyWXDm_DWLcYdI77LmihrwgR2DYO58CoFzy-PE2s3SBf6_zoul0gV5vfEhfWsek1xEORe6fC5hq93p9Xx55tpQA_h1E1t0ir_PPa_5Y1EqZXKW8baSoVfHVWLgc-INRDkKSG5A8oQUT2D2rsW4M8i9Qb1u1FWqF2hvoG82XazkiqZyr1edqnKoeKkAcKEz6bvjOMZZA7cdl_1MRF97jMx7pS5hsQrUQA; u=651612597225055; Hm_lvt_1db88642e346389874251b5a1eded6e3=1612597226; Hm_lpvt_1db88642e346389874251b5a1eded6e3=1612597226; device_id=24700f9f1986800ab4fcc880530dd0ed'}#进行UA伪装
    r=requests.get(url1,headers=headers).text
    # r1=etree.HTML(r.content,etree.HTMLParser(encoding='UTF-8'))
    soup=BeautifulSoup(r,'lxml')
    # a=r1.xpath('//div[@class="col-md-1"]//text()')
    data=soup.findAll('div',class_='col-md-1')
    print(data)
if __name__ == '__main__':
    main(url)
  • 写回答

2条回答 默认 最新

  • 坚持不懈的大白 前端领域优质创作者 2021-02-08 12:46
    关注

    发一下代码吧!

    评论

报告相同问题?

悬赏问题

  • ¥15 虚拟机打包apk出现错误
  • ¥30 最小化遗憾贪心算法上界
  • ¥15 用visual studi code完成html页面
  • ¥15 聚类分析或者python进行数据分析
  • ¥15 逻辑谓词和消解原理的运用
  • ¥15 三菱伺服电机按启动按钮有使能但不动作
  • ¥15 js,页面2返回页面1时定位进入的设备
  • ¥50 导入文件到网吧的电脑并且在重启之后不会被恢复
  • ¥15 (希望可以解决问题)ma和mb文件无法正常打开,打开后是空白,但是有正常内存占用,但可以在打开Maya应用程序后打开场景ma和mb格式。
  • ¥20 ML307A在使用AT命令连接EMQX平台的MQTT时被拒绝