duchang110 2021-09-29 08:24 采纳率: 0%
浏览 38

scrapy 请求chrome控制台抓取到的请求地址,但是报404,地址直接在浏览器输入也是报404

chrome控制台看到的请求地址

img

请求头如下

img

编写的spider如下:


from kemai.items import KemaiItem2
import  logging
from kemai.items import a
from scrapy.downloadermiddlewares.cookies import CookiesMiddleware
class KemaispiderSpider(scrapy.Spider):
    
    name = 'kemaispideryibao'
    allowed_domains = ["10.118.130.127:8001"]
    #start_urls = [constant.getHostUrl()]
    #pagestart=0
    hosturl="http://10.118.130.127:8001/"
    headers = {
        'Content-Type': 'application/x-www-form-urlencoded;charset=UTF-8',
        'Host': '10.118.130.127:8001',
        'Referer':'http://10.118.130.127:8001/dip/logonDipsMonitor.jsp',
        'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36',
        'Accept': '*/*',
        'Accept-Encoding': 'gzip, deflate',
        'Accept-Language': 'zh-CN, zh;q = 0.9',
        'Connection': 'keep-alive',
        'Origin': 'http://10.118.130.127:8001',
        'Cookie': 'loginName = cxcwz;yybm = 37170101; overtimeRedireect=DIPSMONITOR; SF_cookie_6=27943769; JSESSIONID=pEgrYU2R6JiKYZInaouDfkuXkhlJTvjQ!466691487!15742263',
        'X-Requested-With': 'XMLHttpRequest'

    }
   # searchParam = {"gridSessionID":"53880640_b4fd_4d02_ab79_43b241cff015","page":"1","pageSize":"25","updateBeginRowIndex":"0","updateRows":"[]"}

    def start_requests(self):
          yield scrapy.Request("http://10.118.130.127:8001/dip/logonDipsMonitor.jsp", callback=self.login)

    def login(self,response):
     
        yield scrapy.Request(
            url="http://10.118.130.127:8001/dip/dipsLogon.do", 
            body=json.dumps({"method": "doLogonDipsMonitor", "_xmlString": "<?xml version=\"1.0\" encoding=\"UTF-8\"?><p><s userid=\"cxcwz\"/><s passwd=\"b9e79361b4040a3f3a71668163d2f058\"/><s passWordLogSign=\"0\"/><s current_yybm=\"37170101\"/></p>", "_random": "0.015842269101861817"}),
            dont_filter=True,
            headers=self.headers,
            callback=self.parse)

    def parse(self, response):
           print()

在pycharm中的执行结果如下:
2021-09-28 17:50:32 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2021-09-28 17:50:32 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://10.118.130.127:8001/dip/logonDipsMonitor.jsp> (referer: None)
2021-09-28 17:50:34 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://10.118.130.127:8001/dip/dipsLogon.do> (failed 1 times): 404 Not Found
2021-09-28 17:50:38 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://10.118.130.127:8001/dip/dipsLogon.do> (failed 2 times): 404 Not Found
2021-09-28 17:50:42 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET http://10.118.130.127:8001/dip/dipsLogon.do> (failed 3 times): 404 Not Found
2021-09-28 17:50:42 [scrapy.core.engine] DEBUG: Crawled (404) <GET http://10.118.130.127:8001/dip/dipsLogon.do> (referer: http://10.118.130.127:8001/dip/logonDipsMonitor.jsp)
2021-09-28 17:50:42 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 http://10.118.130.127:8001/dip/dipsLogon.do>: HTTP status code is not handled or not allowed

  • 写回答

2条回答 默认 最新

  • CSDN专家-微编程 2021-09-29 09:15
    关注

    404就是请求不到资源,路径问题,看你的报错呀 控制台里面是GET请求,但是浏览器里面是POST,这两个响应不同也就找不到路径了,如果不是这个问题那你就再仔细看看路径吧

    评论

报告相同问题?

问题事件

  • 创建了问题 9月29日

悬赏问题

  • ¥15 请教:如何用postman调用本地虚拟机区块链接上的合约?
  • ¥15 为什么使用javacv转封装rtsp为rtmp时出现如下问题:[h264 @ 000000004faf7500]no frame?
  • ¥15 乘性高斯噪声在深度学习网络中的应用
  • ¥15 运筹学排序问题中的在线排序
  • ¥15 关于docker部署flink集成hadoop的yarn,请教个问题 flink启动yarn-session.sh连不上hadoop,这个整了好几天一直不行,求帮忙看一下怎么解决
  • ¥15 深度学习根据CNN网络模型,搭建BP模型并训练MNIST数据集
  • ¥15 C++ 头文件/宏冲突问题解决
  • ¥15 用comsol模拟大气湍流通过底部加热(温度不同)的腔体
  • ¥50 安卓adb backup备份子用户应用数据失败
  • ¥20 有人能用聚类分析帮我分析一下文本内容嘛