劲仔小鱼 2021-04-26 19:10 采纳率: 100%
浏览 499
已采纳

python爬虫使用requests下载zip,但是报 404 Client Error

我的代码:

import requests
import re
url = r'http://www.synopsys.com/news/pubs/snug/2015/austin/user-papers.zip'
headers = {
    'Host':r'www.synopsys.com',
    'Connection':r'keep-alive',
    'Upgrade-Insecure-Requests':r'1',
    'User-Agent':r'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.72 Safari/537.36 Edg/89.0.774.45',
    'Accept':r'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'Accept-Encoding':r'gzip, deflate, br',
    'Cookie':r'_mkto_trk=id:367-MRV-360&token:_mch-synopsys.com-1592796909619-52572; ELOQUA=GUID=15DC119A3A304AF28B4FDCEF3338A033; s_ecid=MCMID%7C58118164175654254161105037777457862346; s_vi=[CS]v1|2F78AB15051599D1-4000092903B786A6[CE]; _ga=GA1.2.342071361.1592796901; coveo_visitorId=d6972c4a-f26a-4860-8f76-ff116b757904; Hm_lvt_3216a4b0b0e27a0fb7653da6e32487d2=1612407745; __ncuid=1c65c8f9-b01d-4f5f-9523-0ae40ace8c78; AAMC_synopsys_0=REGION%7C11; lfuuid=a8b5811b-3a59-4536-8cf9-d7407a1553a9-c155236-sw1920-sh1080-ms1603962065892-r6903481; sat_track=true; BALANCEID=balancer.lb1; AMCVS_96E61CFE53295EF20A490D45%40AdobeOrg=1; LSKey[CoveoV2]coveo_visitorId=d6972c4a-f26a-4860-8f76-ff116b757904; s_pers=%20gpv_pageName%3D%252Fpage_login_timeout.html%7C1618564045859%3B%20s_nr%3D1618562245887-Repeat%7C1650098245887%3B; s_sess=%20s_cc%3Dtrue%3B%20s_ev23_persist%3Dsiengine.com%3B%20s_ev24_persist%3DCUSTOMER%3B%20s_ev29_persist%3DChina%3B%20s_ev30_persist%3DShanghai%3B%20s_sq%3Dsynopsyssolvnetplusprod%253D%252526pid%25253DDoc%2525253A2021.03%2525253AVC%25252520Formal%2525253Aindex.html%2525253ASynopsys%25252520VC%25252520Static%25252520Platform%25252520Reference%25252520Guide%252526pidt%25253D1%252526oid%25253Dfunction%25252528param_event%25252529%2525257B%25252527usestrict%25252527%2525253Bvarresult%2525252Cevent%2525252Cparent_div%2525253Bresult%2525253Dtrue%2525253B%2525252F%2525252FAccessevent%2525252F%2525252Fevent%2525253Dparam_%252526oidt%25253D2%252526ot%25253DSPAN%3B; iPlanetDirectoryPro=E2B58C430631E1C41DB2D4973E8FD15C; s_cc=true; check=true; s_eVar5=%5B%5BB%5D%5D; _gid=GA1.2.700700754.1619323589; _mkto_trk=id:367-MRV-360&token:_mch-synopsys.com-1592796909619-52572; AMCV_96E61CFE53295EF20A490D45%40AdobeOrg=-715282455%7CMCIDTS%7C18743%7CMCMID%7C58118164175654254161105037777457862346%7CMCAAMLH-1620031430%7C11%7CMCAAMB-1620031430%7C6G1ynYcLPuiQxYZrsz_pkqfLG9yMXBpb2zX5dvJdYQJzPXImdj0y%7CMCOPTOUT-1619433830s%7CNONE%7CvVersion%7C4.2.0%7CMCAID%7CNONE%7CMCSYNCSOP%7C411-18750%7CMCCIDH%7C-349511492; TS01901308=01734f5e7ffb54914b6ebd23e0ba7dd1603ca53a16845011a2cfaad0cc1ebb3387d4f0489c468bd486f5acc186a214a50942d5339b; referrer_url=https://www.synopsys.com/community/snug/snug-world.html; AWSALB=pYF1pwFaSzwuD5sIzjkQdEBUjZH4qBJM8A9Vy+RQPvnb2B13Krh1XEXKkuCp+lP13Fk/D8ycOD2YFOre1cQYw1fhbUey5GqA6rhtzMVQaAmNjmgAWyANVSwyqLOt; AWSALBCORS=pYF1pwFaSzwuD5sIzjkQdEBUjZH4qBJM8A9Vy+RQPvnb2B13Krh1XEXKkuCp+lP13Fk/D8ycOD2YFOre1cQYw1fhbUey5GqA6rhtzMVQaAmNjmgAWyANVSwyqLOt; mbox=PC#1f1bc295ff044a04b1b4f2e42f00b586.38_0#1656041702|session#341f80b270fb424e828e060bc1a6bb94#1619434007; s_getNewRepeat=1619432146483-Repeat; gpv=en-us%3Ecommunity%3Esnug%3ASNUG%20Proceedings; s_sq=%5B%5BB%5D%5D; s_lpt=1619432146549; s_gpn=4; s_ghn=7; _gat_ncAudienceInsightsGa=1; wlp=!/b8aaDKNNyDZ4C+NOHHNDUEOJTSXEE/QAo02o07ITWxd5UIkDQdY6AdkNEgSsXNklcqeO8wt4eFzNbw=; RT=\"z=1&dm=synopsys.com&si=917b46f8-77e6-490b-98de-38e5e4e5f80c&ss=knxxxwor&sl=44&tt=4p8k&bcn=%2F%2F684fc53b.akstat.io%2F&obo=3&ld=i5n3z&ul=i5rnk&hd=i5sle\"; _shibsession_64656661756c7468747470733a2f2f7777772e73796e6f707379732e636f6d2f73686962626f6c657468=_9e3234f2962356b23dca79f2c3c47ba8'
}
r = requests.get(url, headers=headers, stream=True)
with open('test.zip','wb') as f:
    f.write(r.content)
    print("OK")
r.raise_for_status()

报错如下:

requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://solvnet.okta.com/app/solvnetsynopsysexternalprod_www_1/exk7p0yz97c1sfHEI1t7/sso/saml?SAMLRequest=fZJdb4IwFIb%2FCum9FFBEGzFxukQTtxllu9iNKXA2iKVlPcWP%2FfqBzM1dzNv29Hk%2FTkfIC1GySWUyuYaPCtBYx0JIZOeLkFRaMsUxRyZ5AchMwjaThyXzbIeVWhmVKEGsCSJokys5VRKrAvQG9D5P4Hm9DElmTImM0sPhYONJqhJPaCeqoJssj2MlwGQ2oqIN16Orp01ErFltJJe8Qf4CUIm9BGOrneFnAC%2FLy%2BEFDEcDWnJRe0u3teLWpXDcBaVz%2BhwGiYtv8%2FuFawLaCDYRibWYhWTbC3jgx6nT7yWp76XgerHPk3TY9Qf9vucP6jHEChYSDZcmJJ7juR2n1%2FH6keswv8u6wSuxVt%2BF3OUyzeX77fbidgjZPIpWnTb2C2g8R64HyHjUGGRnYX21ldtYflkFGf9bPP4UP6JXGq1gyR5r6GK2UiJPTtZECHWYauAGQuISOm6f%2FP0z4y8%3D&RelayState=ss%3Amem%3Aa166eeb3abc22a94a14347c369089199dbdaf39968cd4f842871177944e44ced
O

但是我自己点击那个url,却直接跳到浏览器下载了。

而且每次运行后,报错的url都不一样。

  • 写回答

5条回答 默认 最新

  • CSDN专家-HGJ 2021-04-26 19:26
    关注

    http状态码404指网页或文件未找到,可能的原因:页面跳转,需要登录,传递参数headers有误,需要其他参数。

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(4条)

报告相同问题?

悬赏问题

  • ¥15 drone 推送镜像时候 purge: true 推送完毕后没有删除对应的镜像,手动拷贝到服务器执行结果正确在样才能让指令自动执行成功删除对应镜像,如何解决?
  • ¥15 求daily translation(DT)偏差订正方法的代码
  • ¥15 js调用html页面需要隐藏某个按钮
  • ¥15 ads仿真结果在圆图上是怎么读数的
  • ¥20 Cotex M3的调试和程序执行方式是什么样的?
  • ¥20 java项目连接sqlserver时报ssl相关错误
  • ¥15 一道python难题3
  • ¥15 牛顿斯科特系数表表示
  • ¥15 arduino 步进电机
  • ¥20 程序进入HardFault_Handler