抓取590多个用户的推文,逻辑是需要首先获取一个用户的id,然后通过用户id构造一个get请求访问推文的api
也就是说抓取一个用户的推文就需要发送两个请求
但是推特只能一次抓取1000次,也就是我的请求数已经超出了,超出了就会报错
然后我在重新启动程序,又可以正常抓取
所以我想着是否是中间需要睡眠一段时间,然后再程序报错时设置一个时间,但是发现时间结束后依然会显示超出1000请求而报错
while True:
user_id_url = f'https://twitter.com/i/api/graphql/hc-pka9A7gyS3xODIafnrQ/UserByScreenName?variables=%7B%22screen_name%22%3A%22{username}%22%2C%22withHighlightedLabel%22%3Atrue%7D'
user_id_url = user_id_url.replace("%27", "%22").replace("True", "true")
csrf_token = random.choice(csrf_token_list)
cookie = words[csrf_token]['cookie']
authorization = words[csrf_token]['authorization']
self.headers['cookie'] = cookie
self.headers['authorization'] = authorization
resp = requests.get(
url=user_id_url,
headers=self.headers
)
if '"Bad guest token"' in resp.text or not resp.text:
# 重新获取x-guest-token
guest_token = get_token()
self.headers['x-guest-token'] = guest_token
if i == 4:
return None
continue
if '"name":"NotFoundError"' in resp.text or 'AuthorizationError' in resp.text:
#账号不存在
return None
if 'RateLimitedError' in resp.text:
print('进入休眠')
print(username)
print(resp.text)
time.sleep(20)
return None