最近在研究爬虫,请求方法是POST,请求内容类型是application/x-www-form-urlencoded,说明以表单的方式提交。
观察响应正文,可以发现,响应内容类型是xml,其中想要提取的数据就在new节点内:
首先构造headers:
请求参数在请求正文中:
可以发现请求参数也是放在xml中,将__xml参数解码后可以发现内容如下:
提交的参数放在p标签里,每次提交请求变化的也只有那些参数,并没有发现加密的迹象
因此构造params:
代码如下:
import requests
target = ".../dorado/smartweb2.RPC.d?__rpc=true" # 公司内网地址,外网无法访问
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko",
"Referer": ".../pages/policynewbiz/inputapplication/pmGDVehicleQuery.jsp?VEHICLELICENSE=&VIN=LEFYECG257HN34234&LICENSETYPE=&Kind=AUTOCOMPRENHENSIVEINSURANCE2014PRODUCT&",
"Content-Type": "application/x-www-form-urlencoded",
"Cookie": "jsessionidp09=X2tfdlLTBZH7xzKwnhSgh2W2N5374T0HnHWYQkl2MRShjBxfpKpW!1484787398; F5cookie=1410712842.6521.0000"
}
params = {
"__type": "loadData",
"__viewInstanceId": "org.view.policynewbiz.inputapplication.pmGDVehicleQuery~org.view.common.viewmodel.CpicViewModel",
"__xml": '%3Crpc%20id%3D%22datasetResult%22%20type%3D%22wrapper%22%20objectClazz%3D%22%22%20pi%3D%221%22%20ps%3D%22100%22%20pc%3D%221%22%20prc%3D%220%22%20fs%3D%22vin%2ClicensePlateNo%2ClicensePlateType%2CengineNo%2CpmVehicleType%2CpmUserNature%2CineffectualDate%2CrejectDate%2CfirstRegisterDate%2ClastCheckDate%2CtransferDate%2CwholeWeight%2CratedPassengerCapacity%2Ctonnage%2Cdisplacement%2CmadeFactory%2Cmodel%2CbrandCN%2CbrandEN%2Chaulage%2Ccolor%2CfuelType%2CvehicleStatus%2CmotorTypeCode%22%3E%3Cps%3E%3Cp%20name%3D%22flag%22%3E1%3C/p%3E%3Cp%20name%3D%22carMark%22/%3E%3Cp%20name%3D%22RackNo%22%3E2FMDK3J95DBC93811%3C/p%3E%3C/ps%3E%3C/rpc%3E%0D%0A',
"__rpc": "true",
}
res = requests.post(url=target, headers=headers, data=params)
html = res.content.decode("utf-8")
print(html)
执行结果报错:
D:\Users\CPIC\AppData\Local\Programs\Python\Python37\python.exe E:/Workspace/Python/SchoolInfo/test56.py
<?xml version="1.0"?>
<result succeed="false" >
<errorMessage>org.xml.sax.SAXParseException: Content is not allowed in prolog.</errorMessage>
<stackTrace><![CDATA[com.bstek.dorado.utils.xml.dom4j.Dom4jXmlBuilder.buildDocument(Dom4jXmlBuilder.java:59)
com.bstek.dorado.view.rpc.AbstractRPCHandler.init(AbstractRPCHandler.java:58)
com.bstek.dorado.view.rpc.LoadDataRPCHandler.init(LoadDataRPCHandler.java:41)
com.bstek.dorado.core.FilterHandle.doFilter(FilterHandle.java:131)
com.bstek.dorado.core.DoradoFilter.doFilter(DoradoFilter.java:70)
weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43)
com.cpic.p09.auto.common.filter.CompatibleFilter.doFilter(CompatibleFilter.java:34)
weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43)
com.cpic.p09.auto.common.filter.ClientCacheFilter.doFilter(ClientCacheFilter.java:71)
weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3242)
weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)
weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:1916)
weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1366)
weblogic.work.ExecuteThread.run(ExecuteThread.java:181)
]]></stackTrace>
<viewProperties></viewProperties>
</result>
Process finished with exit code 0
有哪位大神遇到过这种情况,小弟要抓狂了