Holden_Liu
2021-04-26 19:10
采纳率: 100%
浏览 116
已采纳

python爬虫使用requests下载zip,但是报 404 Client Error

我的代码:

import requests
import re
url = r'http://www.synopsys.com/news/pubs/snug/2015/austin/user-papers.zip'
headers = {
    'Host':r'www.synopsys.com',
    'Connection':r'keep-alive',
    'Upgrade-Insecure-Requests':r'1',
    'User-Agent':r'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.72 Safari/537.36 Edg/89.0.774.45',
    'Accept':r'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'Accept-Encoding':r'gzip, deflate, br',
    'Cookie':r'_mkto_trk=id:367-MRV-360&token:_mch-synopsys.com-1592796909619-52572; ELOQUA=GUID=15DC119A3A304AF28B4FDCEF3338A033; s_ecid=MCMID%7C58118164175654254161105037777457862346; s_vi=[CS]v1|2F78AB15051599D1-4000092903B786A6[CE]; _ga=GA1.2.342071361.1592796901; coveo_visitorId=d6972c4a-f26a-4860-8f76-ff116b757904; Hm_lvt_3216a4b0b0e27a0fb7653da6e32487d2=1612407745; __ncuid=1c65c8f9-b01d-4f5f-9523-0ae40ace8c78; AAMC_synopsys_0=REGION%7C11; lfuuid=a8b5811b-3a59-4536-8cf9-d7407a1553a9-c155236-sw1920-sh1080-ms1603962065892-r6903481; sat_track=true; BALANCEID=balancer.lb1; AMCVS_96E61CFE53295EF20A490D45%40AdobeOrg=1; LSKey[CoveoV2]coveo_visitorId=d6972c4a-f26a-4860-8f76-ff116b757904; s_pers=%20gpv_pageName%3D%252Fpage_login_timeout.html%7C1618564045859%3B%20s_nr%3D1618562245887-Repeat%7C1650098245887%3B; s_sess=%20s_cc%3Dtrue%3B%20s_ev23_persist%3Dsiengine.com%3B%20s_ev24_persist%3DCUSTOMER%3B%20s_ev29_persist%3DChina%3B%20s_ev30_persist%3DShanghai%3B%20s_sq%3Dsynopsyssolvnetplusprod%253D%252526pid%25253DDoc%2525253A2021.03%2525253AVC%25252520Formal%2525253Aindex.html%2525253ASynopsys%25252520VC%25252520Static%25252520Platform%25252520Reference%25252520Guide%252526pidt%25253D1%252526oid%25253Dfunction%25252528param_event%25252529%2525257B%25252527usestrict%25252527%2525253Bvarresult%2525252Cevent%2525252Cparent_div%2525253Bresult%2525253Dtrue%2525253B%2525252F%2525252FAccessevent%2525252F%2525252Fevent%2525253Dparam_%252526oidt%25253D2%252526ot%25253DSPAN%3B; iPlanetDirectoryPro=E2B58C430631E1C41DB2D4973E8FD15C; s_cc=true; check=true; s_eVar5=%5B%5BB%5D%5D; _gid=GA1.2.700700754.1619323589; _mkto_trk=id:367-MRV-360&token:_mch-synopsys.com-1592796909619-52572; AMCV_96E61CFE53295EF20A490D45%40AdobeOrg=-715282455%7CMCIDTS%7C18743%7CMCMID%7C58118164175654254161105037777457862346%7CMCAAMLH-1620031430%7C11%7CMCAAMB-1620031430%7C6G1ynYcLPuiQxYZrsz_pkqfLG9yMXBpb2zX5dvJdYQJzPXImdj0y%7CMCOPTOUT-1619433830s%7CNONE%7CvVersion%7C4.2.0%7CMCAID%7CNONE%7CMCSYNCSOP%7C411-18750%7CMCCIDH%7C-349511492; TS01901308=01734f5e7ffb54914b6ebd23e0ba7dd1603ca53a16845011a2cfaad0cc1ebb3387d4f0489c468bd486f5acc186a214a50942d5339b; referrer_url=https://www.synopsys.com/community/snug/snug-world.html; AWSALB=pYF1pwFaSzwuD5sIzjkQdEBUjZH4qBJM8A9Vy+RQPvnb2B13Krh1XEXKkuCp+lP13Fk/D8ycOD2YFOre1cQYw1fhbUey5GqA6rhtzMVQaAmNjmgAWyANVSwyqLOt; AWSALBCORS=pYF1pwFaSzwuD5sIzjkQdEBUjZH4qBJM8A9Vy+RQPvnb2B13Krh1XEXKkuCp+lP13Fk/D8ycOD2YFOre1cQYw1fhbUey5GqA6rhtzMVQaAmNjmgAWyANVSwyqLOt; mbox=PC#1f1bc295ff044a04b1b4f2e42f00b586.38_0#1656041702|session#341f80b270fb424e828e060bc1a6bb94#1619434007; s_getNewRepeat=1619432146483-Repeat; gpv=en-us%3Ecommunity%3Esnug%3ASNUG%20Proceedings; s_sq=%5B%5BB%5D%5D; s_lpt=1619432146549; s_gpn=4; s_ghn=7; _gat_ncAudienceInsightsGa=1; wlp=!/b8aaDKNNyDZ4C+NOHHNDUEOJTSXEE/QAo02o07ITWxd5UIkDQdY6AdkNEgSsXNklcqeO8wt4eFzNbw=; RT=\"z=1&dm=synopsys.com&si=917b46f8-77e6-490b-98de-38e5e4e5f80c&ss=knxxxwor&sl=44&tt=4p8k&bcn=%2F%2F684fc53b.akstat.io%2F&obo=3&ld=i5n3z&ul=i5rnk&hd=i5sle\"; _shibsession_64656661756c7468747470733a2f2f7777772e73796e6f707379732e636f6d2f73686962626f6c657468=_9e3234f2962356b23dca79f2c3c47ba8'
}
r = requests.get(url, headers=headers, stream=True)
with open('test.zip','wb') as f:
    f.write(r.content)
    print("OK")
r.raise_for_status()

报错如下:

requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://solvnet.okta.com/app/solvnetsynopsysexternalprod_www_1/exk7p0yz97c1sfHEI1t7/sso/saml?SAMLRequest=fZJdb4IwFIb%2FCum9FFBEGzFxukQTtxllu9iNKXA2iKVlPcWP%2FfqBzM1dzNv29Hk%2FTkfIC1GySWUyuYaPCtBYx0JIZOeLkFRaMsUxRyZ5AchMwjaThyXzbIeVWhmVKEGsCSJokys5VRKrAvQG9D5P4Hm9DElmTImM0sPhYONJqhJPaCeqoJssj2MlwGQ2oqIN16Orp01ErFltJJe8Qf4CUIm9BGOrneFnAC%2FLy%2BEFDEcDWnJRe0u3teLWpXDcBaVz%2BhwGiYtv8%2FuFawLaCDYRibWYhWTbC3jgx6nT7yWp76XgerHPk3TY9Qf9vucP6jHEChYSDZcmJJ7juR2n1%2FH6keswv8u6wSuxVt%2BF3OUyzeX77fbidgjZPIpWnTb2C2g8R64HyHjUGGRnYX21ldtYflkFGf9bPP4UP6JXGq1gyR5r6GK2UiJPTtZECHWYauAGQuISOm6f%2FP0z4y8%3D&RelayState=ss%3Amem%3Aa166eeb3abc22a94a14347c369089199dbdaf39968cd4f842871177944e44ced
O

但是我自己点击那个url,却直接跳到浏览器下载了。

而且每次运行后,报错的url都不一样。

  • 写回答
  • 好问题 提建议
  • 关注问题
  • 收藏
  • 邀请回答

5条回答 默认 最新

  • CSDN专家-HGJ 2021-04-26 19:26
    已采纳

    http状态码404指网页或文件未找到,可能的原因:页面跳转,需要登录,传递参数headers有误,需要其他参数。

    已采纳该答案
    评论
    解决 无用
    打赏 举报
  • 对URL进行编码处理一下,再测试一下试试

    评论
    解决 无用
    打赏 举报
  • CSDN专家-九宝老师 2021-04-27 09:12

    404就是地址找不到

    评论
    解决 无用
    打赏 举报
  • 有问必答小助手 2021-04-27 10:04

    您好,我是有问必答小助手,你的问题已经有小伙伴为您解答了问题,您看下是否解决了您的问题,可以追评进行沟通哦~

    如果有您比较满意的答案 / 帮您提供解决思路的答案,可以点击【采纳】按钮,给回答的小伙伴一些鼓励哦~~

    ps:问答VIP仅需29元,即可享受5次/月 有问必答服务,了解详情>>>https://vip.csdn.net/askvip?utm_source=1146287632

    评论
    解决 无用
    打赏 举报
  • 有问必答小助手 2021-04-28 15:43

    非常感谢您使用有问必答服务,为了后续更快速的帮您解决问题,现诚邀您参与有问必答体验反馈。您的建议将会运用到我们的产品优化中,希望能得到您的支持与协助!

    速戳参与调研>>>https://t.csdnimg.cn/Kf0y

    评论
    解决 无用
    打赏 举报

相关推荐 更多相似问题