天猫评论爬虫遭反爬
```python
def get_review(url,goodname,goodclass):
time.sleep(random.uniform(10,20))
contents = []
headers = {
'cookie':'已隐藏'
,'user-agent':r'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36'
,'referer':'https://detail.tmall.com/item.htm?spm=a1z10.3-b-s.w4011-14595640457.298.7ebb17b1PfGDbX&id=623394554673&rn=9f66587e5923b5e737cbbc016de9b677&skuId=4577070898930'
,'accept':'*/*'
,'accept-encoding':'gzip, deflate, br'
,'accept-language':'zh-CN,zh;q=0.9'
}
response = rq.get(url,headers=headers).text
# time.sleep(5)
try:
response = response.split(')')[0]
response = response.split('(')[1]
data = json.loads(response)
for i in data['rateDetail']['rateList']:
if i['appendComment'] != None:
review = str(i['rateContent'])+','+str(i['appendComment']['content'])
else:
review = str(i['rateContent'])
if i['appendComment'] != None:
bt = str(i['appendComment']['commentTime']).split(' ')[0]
else:
bt = str(i['rateDate']).split(' ')[0]
content = [review,bt,goodname,goodclass]
contents.append(content)
df = pd.DataFrame(data=contents,columns=['comment','time','good_name','good_type'])
return df
except:
print('有点问题')
###### cookie用的是登录后的,爬取二十页左右开始报错, 主要问题是request.get得到的是
{"rgv587_flag": "sm", "url": "//rate.tmall.com:443/list_detail_rate.htm/_____tmd_____/punish?x5secdata=5e0c8e1365474455070961b803bd560607b52cabf5960afff39b64ce58073f78d67c783afbf2f1429bb88d22e9de8dc924fb9b529c904864bbfe3d3fb7f481ac654959777bb93a8a46736b198f52750ae0f0058c9e35ca1342909838a622e73d45cb0ce36cef9f62cbd52852a03cf8ba461ee819ca12264cfd380e1ff9a31817eb7bb56718d9045e71a36bf5c104a31381c5772d63b0f57d2db7904d218e4b5cb2119896354387522c4277060a306d8779a52a883b8d79c21b1904b01749f64fd67c783afbf2f1429bb88d22e9de8dc924fb9b529c904864bbfe3d3fb7f481ac654959777bb93a8a46736b198f52750aba054159bd12f4e49383632589de52415127ba80eadd57577355098f1203f81009490045313404a034c300f6f334c988cf8b3d6c14e48c2ab40794cc1e1a04bd43057e2edd1838e1ccc05f4f01cfb61713b9e53b3a344694df999179f2180f7b845ebbb7c256e077889b653f76774fc4c74ee8e9e999cde873522a2663ee17e879a23c364d635f7a361193b1d191cf8fed81c65bebf3b9df46dd6afed6f19989714844c0713ecd4e394877978ee6e9104491e6e26e712b31d9b7ccb1b645df8a5ff640b33682743330a508a275a3f26aab570034bcc3f82cda2fb36536ba0f78b251916ca16b3d87deb696c0814bb75c8a97df8bab9ed7ed243b3656c0b7e1004b356f289e0ac65f51f6426fd9a3e03b41ca3ac6eb5a5b9020a1340974a8361a&x5step=2"}
看网上教程是天猫滑块验证,但要用到selenium,有方法可以直接绕过滑块吗?
不行的话能不能在已有代码上实现滑块验证,实在不想用selenium再写一遍了