ZnPI 2019-08-28 10:01 采纳率: 0%
浏览 934

python爬虫爬取时提示org.xml.sax.SAXParseException: Content is not allowed in prolog.

最近在研究爬虫,请求方法是POST,请求内容类型是application/x-www-form-urlencoded,说明以表单的方式提交。

观察响应正文,可以发现,响应内容类型是xml,其中想要提取的数据就在new节点内:
图片说明

首先构造headers:
图片说明

请求参数在请求正文中:
图片说明

可以发现请求参数也是放在xml中,将__xml参数解码后可以发现内容如下:
图片说明

提交的参数放在p标签里,每次提交请求变化的也只有那些参数,并没有发现加密的迹象
因此构造params:
图片说明

代码如下:

import requests


target = ".../dorado/smartweb2.RPC.d?__rpc=true"    # 公司内网地址,外网无法访问
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko",
    "Referer": ".../pages/policynewbiz/inputapplication/pmGDVehicleQuery.jsp?VEHICLELICENSE=&VIN=LEFYECG257HN34234&LICENSETYPE=&Kind=AUTOCOMPRENHENSIVEINSURANCE2014PRODUCT&",
    "Content-Type": "application/x-www-form-urlencoded",
    "Cookie": "jsessionidp09=X2tfdlLTBZH7xzKwnhSgh2W2N5374T0HnHWYQkl2MRShjBxfpKpW!1484787398; F5cookie=1410712842.6521.0000"
}

params = {
    "__type": "loadData",
    "__viewInstanceId": "org.view.policynewbiz.inputapplication.pmGDVehicleQuery~org.view.common.viewmodel.CpicViewModel",
    "__xml": '%3Crpc%20id%3D%22datasetResult%22%20type%3D%22wrapper%22%20objectClazz%3D%22%22%20pi%3D%221%22%20ps%3D%22100%22%20pc%3D%221%22%20prc%3D%220%22%20fs%3D%22vin%2ClicensePlateNo%2ClicensePlateType%2CengineNo%2CpmVehicleType%2CpmUserNature%2CineffectualDate%2CrejectDate%2CfirstRegisterDate%2ClastCheckDate%2CtransferDate%2CwholeWeight%2CratedPassengerCapacity%2Ctonnage%2Cdisplacement%2CmadeFactory%2Cmodel%2CbrandCN%2CbrandEN%2Chaulage%2Ccolor%2CfuelType%2CvehicleStatus%2CmotorTypeCode%22%3E%3Cps%3E%3Cp%20name%3D%22flag%22%3E1%3C/p%3E%3Cp%20name%3D%22carMark%22/%3E%3Cp%20name%3D%22RackNo%22%3E2FMDK3J95DBC93811%3C/p%3E%3C/ps%3E%3C/rpc%3E%0D%0A',
    "__rpc": "true",
}

res = requests.post(url=target, headers=headers, data=params)
html = res.content.decode("utf-8")
print(html)

执行结果报错:

D:\Users\CPIC\AppData\Local\Programs\Python\Python37\python.exe E:/Workspace/Python/SchoolInfo/test56.py
<?xml version="1.0"?>
<result succeed="false" >
<errorMessage>org.xml.sax.SAXParseException: Content is not allowed in prolog.</errorMessage>
<stackTrace><![CDATA[com.bstek.dorado.utils.xml.dom4j.Dom4jXmlBuilder.buildDocument(Dom4jXmlBuilder.java:59)
com.bstek.dorado.view.rpc.AbstractRPCHandler.init(AbstractRPCHandler.java:58)
com.bstek.dorado.view.rpc.LoadDataRPCHandler.init(LoadDataRPCHandler.java:41)
com.bstek.dorado.core.FilterHandle.doFilter(FilterHandle.java:131)
com.bstek.dorado.core.DoradoFilter.doFilter(DoradoFilter.java:70)
weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43)
com.cpic.p09.auto.common.filter.CompatibleFilter.doFilter(CompatibleFilter.java:34)
weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:43)
com.cpic.p09.auto.common.filter.ClientCacheFilter.doFilter(ClientCacheFilter.java:71)
weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3242)
weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)
weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:1916)
weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1366)
weblogic.work.ExecuteThread.run(ExecuteThread.java:181)
]]></stackTrace>
<viewProperties></viewProperties>
</result>

Process finished with exit code 0

有哪位大神遇到过这种情况,小弟要抓狂了

  • 写回答

1条回答 默认 最新

  • 繁华三千东流水 2019-08-28 10:23
    关注

    result succeed="false" # 结果返回失败
    org.xml.sax.SAXParseException: Content is not allowed in prolog.# 你的内容没有允许被显示
    我换个网址试了一下你的代码结构是可以正常运行的,所以我觉得问题应该还是出在参数方面吧。

    评论

报告相同问题?

悬赏问题

  • ¥15 求差集那个函数有问题,有无佬可以解决
  • ¥15 MATLAB动图问题
  • ¥15 【提问】基于Invest的水源涵养
  • ¥20 微信网友居然可以通过vx号找到我绑的手机号
  • ¥15 寻一个支付宝扫码远程授权登录的软件助手app
  • ¥15 解riccati方程组
  • ¥15 display:none;样式在嵌套结构中的已设置了display样式的元素上不起作用?
  • ¥15 使用rabbitMQ 消息队列作为url源进行多线程爬取时,总有几个url没有处理的问题。
  • ¥15 Ubuntu在安装序列比对软件STAR时出现报错如何解决
  • ¥50 树莓派安卓APK系统签名