我想将套餐中部分可以供选择的值爬取出来,所以先得到了这样的一个集合
import requests
import re
import os
from lxml import etree
from bs4 import BeautifulSoup
if __name__ == "__main__":
# 爬取页面源码数据
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'
} # UA伪装
url = 'https://item.jd.com/10026876494242.html'
page = requests.get(url=url, headers=headers)
page_text = page.text
soup = BeautifulSoup(page_text,"html.parser")
choose_list = soup.find_all("div",{"id":"choose-attrs"})
print(choose_list)
执行后得到
[<div id="choose-attrs">
<div class="li p-choose" data-idx="0" data-type="颜色" id="choose-attr-1">
<div class="dt">选择颜色 </div>
<div class="dd">
<div class="item" data-sku="10026876494233" data-value="红色">
<b></b>
<a clstag="shangpin|keycount|product|yanse-红色" href="#none">
<img alt="红色" data-img="1" height="40" src="//img13.360buyimg.com/n9/s40x40_jfs/t1/188751/27/9518/163751/60d01092E2c530e92/1b74c84a46058a5a.jpg" width="40"/><i>红色</i>
</a>
</div>
<div class="item" data-sku="10026876494236" data-value="绿色">
后面太多省略了,然后我想要的就是这个data-value,但是按照自己写的表达式要么报错要么返回为空,该怎么写呢