cyl531207502 2020-03-31 15:59 采纳率: 64.3%
浏览 459
已采纳

拉勾网爬取数据问题,请帮忙看看谢谢

from urllib import request
from urllib import parse
url="https://www.lagou.com/jobs/positionAjax.json?city=%E6%88%90%E9%83%BD&needAddtionalResult=false"
header = {
"Accept": "application/json, text/javascript, /; q=0.01",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "zh-CN,zh;q=0.9",
"Connection": "keep-alive",
"Content-Length": 25,
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"Cookie":"_ga=GA1.2.1138539770.1585636133; gid=GA1.2.1776239920.1585636133; user_trace_token=20200331142853-c1dac458-3664-4392-ac4a-69c04bd926ad; LGUID=20200331142853-dc342d00-8bd1-4ef0-be94-e1fc963c7f66; index_location_city=%E6%88%90%E9%83%BD; sajssdk_2015_cross_new_user=1; sensorsdata2015jssdkcross=%7B%22distinct_id%22%3A%221712f483c25225-00fccc2a691313-f313f6d-1049088-1712f483c2633d%22%2C%22%24device_id%22%3A%221712f483c25225-00fccc2a691313-f313f6d-1049088-1712f483c2633d%22%2C%22props%22%3A%7B%22%24latest_traffic_source_type%22%3A%22%E7%9B%B4%E6%8E%A5%E6%B5%81%E9%87%8F%22%2C%22%24latest_referrer%22%3A%22%22%2C%22%24latest_referrer_host%22%3A%22%22%2C%22%24latest_search_keyword%22%3A%22%E6%9C%AA%E5%8F%96%E5%88%B0%E5%80%BC%E7%9B%B4%E6%8E%A5%E6%89%93%E5%BC%80%22%7D%7D; JSESSIONID=ABAAAECABBJAAGI1DC0715445E99FC390A39784928682B0; WEBTJ-ID=20200331143339-1712f4ba93753-01b5d59a319d9f-f313f6d-1049088-1712f4ba939425; Hm_lvt_4233e74dff0ae5bd0a3d81c6ccf756e6=1585636134,1585636420; LGSID=20200331152720-18ff9ecf-6dde-4b69-a5ac-cafab6f4470c; PRE_UTM=; PRE_HOST=; PRE_SITE=https%3A%2F%2Fwww.lagou.com%2F; PRE_LAND=https%3A%2F%2Fwww.lagou.com%2Fjobs%2Flist%5Fpython%3FlabelWords%3D%26fromSearch%3Dtrue%26suginput%3D; lagou_utm_source=A; gate_login_token=ab68787e51ce7cdc4177f8a0dc2bf580b680d718f0c2da0c; _putrc=08BDB62514E63F98; login=true; unick=%E7%A8%8B%E5%AE%87%E9%BE%99; _gat=1; showExpriedIndex=1; showExpriedCompanyHome=1; showExpriedMyPublish=1; hasDeliver=0; X_HTTP_TOKEN=0e03eaf6286772819580465851933e81482e1a7c06; Hm_lpvt_4233e74dff0ae5bd0a3d81c6ccf756e6=1585640860; privacyPolicyPopup=false; TG-TRACK-CODE=index_search; LGRID=20200331154749-a4a36d65-c7eb-4afb-ad17-70b31d6f293b; SEARCH_ID=6b9fa4ae7fe44a208c3dc67ed1ec3e44",
"Host": "www.lagou.com",
"Origin": "https://www.lagou.com",
"Referer":"https://www.lagou.com/jobs/list_python?labelWords=&fromSearch=true&suginput=",
'Upgrade-Insecure-Requests': '1',
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "same-origin",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36",
"X-Anit-Forge-Code": "0",
"X-Anit-Forge-Token": "None",
"X-Requested-With": "XMLHttpRequest",
}
data = {
"first":"true",
"pn":1,
"kd":"python"
}
host = {
"Host":"www.lagou.com"
}
req = request.Request(url,headers=header,data=parse.urlencode(data).encode("utf-8"),origin_req_host=parse.urlencode(host),method="POST")
resp = request.urlopen(req)
print(resp.read().decode("utf-8"))

代码如上,只是想爬取对应的职位信息,但是始终不行,一直提示
{"status":false,"msg":"您操作太频繁,请稍后再访问","clientIp":"117.139.247.197","state":2402}
但是用网页操作又是好的,url地址,是position.Ajax.json里面找的,然后在header这里把所有的内容都复制出来了,还是不行,请知道的告诉下,是不是又升级了啊,如何解决呢?谢谢了

  • 写回答

3条回答

  • 编程爱好者熊浪 2020-03-31 16:15
    关注

    人家检测到你的这个IP访问的太频繁了

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

问题事件

  • 已采纳回答 8月31日

悬赏问题

  • ¥20 sub地址DHCP问题
  • ¥15 delta降尺度计算的一些细节,有偿
  • ¥15 Arduino红外遥控代码有问题
  • ¥15 数值计算离散正交多项式
  • ¥30 数值计算均差系数编程
  • ¥15 redis-full-check比较 两个集群的数据出错
  • ¥15 Matlab编程问题
  • ¥15 训练的多模态特征融合模型准确度很低怎么办
  • ¥15 kylin启动报错log4j类冲突
  • ¥15 超声波模块测距控制点灯,灯的闪烁很不稳定,经过调试发现测的距离偏大