问题遇到的现象和发生背景
python爬取网页内容时,没有报错,也有文件出来,但是啥也没爬到T_T
问题相关代码,请勿粘贴截图
import requests
import re
import pandas as pd
def dangdang(page):
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4421.5 Safari/537.36'}
url = 'https://shop393740542.taobao.com/search.htm?spm=a1z10.3-c-s.w4002-21563653144.84.79972400ER6CEp&_ksTS=1637388187519_221&callback=jsonp222&input_charset=gbk&mid=w-21563653144-0&wid=21563653144&path=%2Fsearch.htm&search=y&pageNo='+str(page)+'#anchor'
response = requests.get(url=url, headers=headers).text
p_picture = '<div class="item3line1">.*? <img .*? src="(.*?)"></a>'
p_name='<div class="item3line1">.*?<a .*?>(.*?)</a>'
p_sale='<div class="sale-area">.*?<span class="sale-num">(.*?)</span>'
p_comments='<h4>.*?<span>(.*?)</span></a>'
p_price = '<div class="cprice-area">.*?<span class="c-price">(.*?)</span>'
picture = re.findall(p_picture, response,re.S)
name = re.findall(p_name, response,re.S)
comments = re.findall(p_comments, response,re.S)
sale=re.findall(p_sale,response,re.S)
price = re.findall(p_price, response, re.S)
data = { '商品封面': picture, '商品名': name, '价格': price,'评论数': comments, '月销量': sale}
data = pd.DataFrame(data)
return data
all_data = pd.DataFrame()
for i in range(1, 10):
all_data = all_data.append(dangdang(i))
all_data.to_excel('银河汽水家售卖商品.xlsx', index=False)
我想着是不是匹配有问题,就改了点 这样也是没有
import requests
import re
import pandas as pd
def dangdang(page):
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4421.5 Safari/537.36'}
url = 'https://shop393740542.taobao.com/search.htm?spm=a1z10.3-c-s.w4002-21563653144.84.79972400ER6CEp&_ksTS=1637388187519_221&callback=jsonp222&input_charset=gbk&mid=w-21563653144-0&wid=21563653144&path=%2Fsearch.htm&search=y&pageNo='+str(page)+'#anchor'
response = requests.get(url=url, headers=headers).text
p_picture = '<d1 class="item.*?" data-id=".*?">.*?<img alt=".*?" src="(.*?)"></a>'
p_name='<dd class="detail"><a .*?>(.*?)</a>'
p_sale='<div class="sale-area">.*?<span class="sale-num">(.*?)</span>'
p_comments='<h4>.*?<span>(.*?)</span></a>'
p_price = '<div class="cprice-area">.*?<span class="c-price">(.*?)</span>'
picture = re.findall(p_picture, response,re.S)
name = re.findall(p_name, response,re.S)
comments = re.findall(p_comments, response,re.S)
sale=re.findall(p_sale,response,re.S)
price = re.findall(p_price, response, re.S)
data = { '商品封面': picture, '商品名': name, '价格': price,'评论数': comments, '月销量': sale}
data = pd.DataFrame(data)
return data
all_data = pd.DataFrame()
for i in range(1, 10):
all_data = all_data.append(dangdang(i))
all_data.to_excel('银河汽水家售卖商品.xlsx', index=False)
运行结果及报错内容
没报错,出了个Excel文件,但没爬到我要的内容