问题遇到的现象和发生背景
无法用正则提取到图片链接地址
用
ex='<div class="media_bigpic_wrap"><img class="j_retract" id="(.*?)" src="(.*?)" onerror.*?</div>'
提取#<div class="media_bigpic_wrap"><img class="j_retract" id="big_img_1668586480803" src="https://tiebapic.baidu.com/forum/pic/item/0cd7912397dda14415392970f7b7d0a20df486c4.jpg?tbpicau=2022-11-18-05_6b253562deaa086f40d12262ff9c2b7d" onerror="this.src='//tb2.bdstatic.com/tb/static-frs/img/v2/picerr.gif';this.width=82;this.height=75;" style="visibility: visible;"></div>
中的#图片链接地址:https://tiebapic.baidu.com/forum/pic/item/0cd7912397dda14415392970f7b7d0a20df486c4.jpg?tbpicau=2022-11-18-05_6b253562deaa086f40d12262ff9c2b7d
用代码块功能插入代码,请勿粘贴截图
import requests
import re
import os
if __name__ == '__main__':
if not os.path.exists('./nbaLibs'):
os.mkdir('./nbaLibs')
url='https://tieba.baidu.com/f?kw=nba'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36 Edg/107.0.1418.42'
}
page_text=requests.get(url=url,headers=headers).text
ex='<div class="media_bigpic_wrap"><img class="j_retract" id="(.*?)" src="(.*?)" onerror.*?</div>'
#ex ='(?:https?:\/\/)?[a-zA-Z0-9][-a-zA-Z0-9]{0,62}(?:\.[a-zA-Z0-9][-a-zA-Z0-9]{0,62})+.+\.(gif|png|jpg|jpeg|webp|svg|psd|bmp|tif)'
img_src_list = re.findall(ex, page_text, re.S)
print(img_src_list)
#<div class="media_bigpic_wrap"><img class="j_retract" id="big_img_1668586480803" src="https://tiebapic.baidu.com/forum/pic/item/0cd7912397dda14415392970f7b7d0a20df486c4.jpg?tbpicau=2022-11-18-05_6b253562deaa086f40d12262ff9c2b7d" onerror="this.src='//tb2.bdstatic.com/tb/static-frs/img/v2/picerr.gif';this.width=82;this.height=75;" style="visibility: visible;"></div>
#图片链接地址:https://tiebapic.baidu.com/forum/pic/item/0cd7912397dda14415392970f7b7d0a20df486c4.jpg?tbpicau=2022-11-18-05_6b253562deaa086f40d12262ff9c2b7d
for src in img_src_list:
img_data=requests.get(url=src,headers=headers).content
image_name=src.split('/')[-1]
imaPath='./nbaLibs'/+ image_name
with open(imaPath,'wb') as fp:
fp.write(img_data)
print(image_name,'下载成功!!!!')
运行结果及报错内容
为空列表,并未提取到我想要的图片地址