ancientdate 2020-08-17 19:20 采纳率: 50%
浏览 3221

python 网页爬虫返回reponse 412,求大神解da?

最近学习网站的数据爬取,遇到了一个问题,在get那一步后出现了reponse412错误,后面用BeautifulSoup解析出来也是一串乱七八糟的字母,求大神解答

import pandas as pd
from bs4 import BeautifulSoup
from requests import session

url = 'http://www.czce.com.cn/cn/DFSStaticFiles/Future/2020/20200817/FutureDataDaily.htm'

request_header = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
                'Accept-Encoding': 'gzip, deflate',
                'Accept-Language': 'zh-CN,zh;q=0.9',
                'Cache-Control': 'max-age=0',
                'Connection': 'keep-alive',
                'Cookie': 'XquW6dFMPxV380S=6pf_jtMjsFdbr4MJbUV9QBpnNlG48BXQNNLi2uvrBFjlpK0cO86Eqozbn8dpoC40; UM_distinctid=1738fbd1989b49-06ef27097ca487-504f221b-1fa400-1738fbd198a7a4; CNZZDATA1264458526=411363803-1578288241-null%7C1597655705; XquW6dFMPxV380T=5xLc8RiEIZoxPydzrOCUyx8fMfGBEfp1B9G0pvO37ActOX7mn8G3BY3As4O0LSnaRIx5v_RBultmAKBAiSV7CD4mxFcNI70rMoT5pnpenZKx3Ah_iKGmPjhLCQnjDuZ2QEFRo1zYkOIKmxdqApiKoZV7XrTIL4kf5o7qUPHoW9ELgnHudg4m4xjyAXi0OsovM3si3lNEbGMaYcB2nEw7Xi_krbO_gwwB3vuzJlSjYQnDEgLlACnZiBbBmmV1K0kijpAtT.OyIBTLELIQJYng_OQWxgThZH5KQ2dg.rznxYb3IOPUHeTOI4dYpdQyF2DWjLuZrikKueEP73Zg76kCNQdlh59pcFgcFmEn1qKUQMGvYd0YE4ZOALyHT05LWkrFTUZV',
                'Host': 'www.czce.com.cn',
                'Referer': 'http://www.czce.com.cn/cn/jysj/mrhq/H770301index_1.htm?yXi2jvqo=wtpUrarCZRVXIezUj5hAi2NwkLCal_5yMkCtTrJqiY3qqEE',
                'Upgrade-Insecure-Requests': '1',
                'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'}


s_zf  = session() 

re_zf = s_zf.get(url ,headers = request_header)

re_zf.encoding = 'utf-8'

bs_zf = BeautifulSoup(re_zf.text)

bs_zf_th = bs_zf.find_all('table')[0].find_all('tr')[0].find_all('td')


list_columns_zf = []
for i in range(len(bs_zf_th)) :

    list_columns_zf.append(bs_zf_th[i].getText())

df_zf =pd.DataFrame([],columns = list_columns_zf)

result_zf =  bs_zf.find_all('table')[0].find_all('tr')[1:-1]

for result in result_zf:

    list_zf = []

    data_zf = result.find_all('td')

    for j in range(len(data_zf)):

        list_zf.append(data_zf[j].getText().replace('\n','').replace('\t','').replace('\r','').replace(' ',''))

    df_zf = df_zf.append(pd.DataFrame(list_zf,index = list_columns_zf).T)   

输出结果如下

BeautifulSoup(re_zf.text)
Out[20]: 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<meta content="{qh8TBAEfLaGpMl4608r0qKeoWU.HTRT1vZl4FJ1lbzZqqc80~FpqGeVc04hvaTRoZdq0puRpyycP7fIupRwll5RUeCxKqsKqJPknEt36SdE6GNQveDQYWDMG2Lop9xmrSumVZf16eIHlVi8opOEpqURfJ6ECZSw6eulbVeUcT1tAZfso0HEb3hcUZEQqLXsAfnmoZ.R6RScPW88OqLqAqzJ1eCmfVJHSQphAydlOV5l6LGca98Hr26Fu3ErpG6HaGzqbWXm6Z0qA0BJsp7Qq9JJUJbtS9XMfAlhGVjtAypF2E2VU0ox2yjqOw1QYZnlvqiJpQXUAQ2F932pU0sx9NjoOwitq3oqOEym9aTHoE7ISQplGZoWpVhSss7c49ievMbEeTe7sUcmwfdqqqqqqqqJ1597659565179hlwzRI97pwBjl3650qlkD9Ok3agD6a49abqqqqqqqqqh.wevSHDj8ZIk675r4r0qHHxAJNaP2kjAJNaL}LH.aWfG8xrJrInY8vU4Yp7rcuV.mR.mcTR8rzLmDjYMTIBYB78tmIBYML3dyFzlCSshNiLm8_VHfJNTtS1BlR5fvnR8yOBAu9r5ZlfmDc1geMOV.tURwMf0jc8gpMfmXl17WRb28xKglRSfcsUyeXv25tFLTi9r4ADJwtP9XQU7GluadDVRJt9VXQYCYpRAQcN0ZNs5XHj3n2ujUiRV59kCVHNQ_qqqqq!x7z,aac,amr,asm,avi,bak,bat,bmp,bin,c,cab,css,csv,com,cpp,dat,dll,doc,dot,docx,exe,eot,fla,flc,fon,fot,font,gdb,gif,gz,gho,hlp,hpp,htc,ico,ini,inf,ins,iso,js,jar,jpg,jpeg,json,java,lib,log,mid,mp4,mpa,m4a,mp3,mpg,mkv,mod,mov,mim,mpp,msi,mpeg,obj,ocx,ogg,olb,ole,otf,py,pyc,pas,pgm,ppm,pps,ppt,pdf,pptx,png,pic,pli,psd,qif,qtx,ra,rm,ram,rmvb,reg,res,rtf,rar,so,sbl,sfx,swa,swf,svg,sys,tar,taz,tif,tiff,torrent,txt,ttf,vsd,vss,vsw,vxd,woff,woff2,wmv,wma,wav,wps,xbm,xpm,xls,xlsx,xsl,xml,z,zip,apk,plist,ipaqqYG8avoHCi3mQ371K38q0fnt1083211841r0Vm0VbQcySEUl28cpaEoLXq{1t7Gmj0sbMLSybm_zUjpxPliOs_Wm2T5bU_mH5SO2Di3myp.jDX91_S60Mt28jmP4M.y8GK_qAgysuVtApam42100{mTDbAYW3Wcdl6VDDAdbGxFG;PwRuQjDaKPhGK18CytJcVq;sS7OZNXZ7Fg4RDlSAMpsuA;qqqqq!xQ_eHSf6qr7YsSG.kmL2p6p.1meEwb2HAmgwpycusowlQ6SIGhixs.StBmByp2uDkAjZ7LqvaiMlOGk8NxFg1fVXDDRYrX1U9xNQ8fGFhmEawLqsOHLL3LkB7DEl67ntQcRJDbOtkrFamXS_XJ.zmvVDTxL9DfOOFl3ekXA1cHRVtv1j9Jh9NaubymxgvfsUmrRgRB2C8m70K90jVxQG30SnRpEg.FuwlkkWKF1TLD00lQ1p3m67wFGrHolquFsgyDTlQR9p1l1EX8fabEVrFQrRVHYa.MpRLloVgQkENHcalVclSH00X8cmkDDZdwVToE2gfIG0MWSgARuSZH0Z_sqm5x69PqSSZAK0A31RslUqGRu2gqalWc17kroWlck3mW93UkpLKxGLiikZmWGQqEfrdiUAAR93wlAGWQPA3q1ykrPEQkAJkRkLZKGq5Wk2PqSaliul4JayK8GE_Dc9p3m3erOA_Wf7jlaqrVu7Xoq7fWpRIW29u10Lqq"/><!--[if lt IE 9]><script r='m'>document.createElement("section")</script><![endif]--><script charset="iso-8859-1" r="m" src="/rYAJslLyF6jA/oGj1PdjdnKJ2.6a49ab9.js" type="text/javascript"></script><script r="m" type="text/javascript">(function(){var _$BD=0,_$py=[[9,6,1,6,5,4,3,0,8,7,8,2],[10,2,53,2,54,62,40,85,77,17,34,17,50,89,13,70,13,17,84,88,86,27,31,24,95,33,69,45,28,37,17,60,52,83,42,87,61,81,57,39,11,5,81,98,0,92,44,6,81,16,55,63,62,3,81,46,4,30,12,96,73,41,23,81,68,80,81,35,8,1,7,17,51,79,66,1,90,82,17,97,1,17,76,62,99,38,20,48,26,17,93,15,36,9,29,94,65,19,43,47,64,75,21,67,59,71,58,14,72,32,22,56,18,25,74,78,49,91,17],[19,1,2,1,0,27,11,21,13,26,9,17,25,31,20,9,29,30,6,22,14,28,18,15,16,12,16,32,16,7,16,23,5,16,8,3,8,16,33,10,4,24,9]
  • 写回答

3条回答 默认 最新

  • 脆果儿不黏(初雪) 2021-06-30 19:37
    关注

    你也是要爬取郑商所日数据呀?我也遇到同样问题,解决了吗?

    评论

报告相同问题?

悬赏问题

  • ¥15 求差集那个函数有问题,有无佬可以解决
  • ¥15 【提问】基于Invest的水源涵养
  • ¥20 微信网友居然可以通过vx号找到我绑的手机号
  • ¥15 寻一个支付宝扫码远程授权登录的软件助手app
  • ¥15 解riccati方程组
  • ¥15 display:none;样式在嵌套结构中的已设置了display样式的元素上不起作用?
  • ¥15 使用rabbitMQ 消息队列作为url源进行多线程爬取时,总有几个url没有处理的问题。
  • ¥15 Ubuntu在安装序列比对软件STAR时出现报错如何解决
  • ¥50 树莓派安卓APK系统签名
  • ¥65 汇编语言除法溢出问题