敲敲我的脑袋 2023-10-31 10:20 采纳率: 57.1%
浏览 4
已结题

爬虫动态网页,获得的数据与网页数据不同

爬虫下面的动态网页数据,找到对应的数据接口,获取相对应的数据,可是最终获得数据是第二张图

img

img

和页面中的数据完全不同,然后将爬取的数据在网页上搜索页查找不到,麻烦大家看看是什么情况
下面附上我的代码:

import re
import csv
import pandas as pd
import requests
import json
# ?&source=3&tenderCode=cscec202309120000226457
base_url='https://yzmtg.yzw.cn/portal/tender/winner/detail?'
url='https://yzmtg.yzw.cn/portal/tender/pageWinners'

headers={

    'content-type': 'application/json',
    'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36 SLBrowser/8.0.1.4031 SLBChan/30',
    'x-yzw-auth-token': 'eON4ElMhyf1NXN6ARGnuWMHDu5ev0xEHI7pc02jBsu2TRRyMuFFSdZwE7fAPPTaM9ydbz7LxLBPq%2BeAbVjaHxg7x6hakMTGP5KxJ%2BvWiDws%3D',
    'referer': 'https://xy.yzw.cn/search/sj/bid',
    'cookie':'Hm_lvt_c1832dca4922753f642109f295f07eba=1698710490; yzw-auac-token=eON4ElMhyf1NXN6ARGnuWMHDu5ev0xEHI7pc02jBsu2TRRyMuFFSdZwE7fAPPTaM9ydbz7LxLBPq+eAbVjaHxg7x6hakMTGP5KxJ+vWiDws=; LoginRequestKey=C48368C7319C27F01498961883BFED38341F634CB9368971EF70317A4A9D5CA05077590E5CDDAF25A8D0F7DEEE54AD81805C82B997F0B3BC0E400F9083406D84D6AEF74636107A16F101E7FD671D085519D8C8613002DA27BC6D0D26CFDCA189BF01BC221CCDAE50D212503F88591FFE46813C0E9226AC568F643A7587AA1B31F88779DF1558B9831E55FB100EF94DA90B; web.auth.yzw=FFE9EAC1BE770A00E02FA20847B363978F710544F325EE3ABD3B535C69906CE725085C7908A7FE6DE2C3ECF5C510A597E508D594B0CB83A5524B00C9D929079892BBE6326D094809E54AFDCE5801671A7D5F4DD78957B332955E3A17F5D496D1B20CB13678F8A552FF601B2B8D2C4C930682251D9931A60C983C01608D933A058A88038B7D32F9A6096CFB2E18FE01F52184CFC36DC301E569E9F1A278DBE951549651AE815EDE617EB758CF941B5366C1B55D5685259C839FFFAF7FD4249F4C107BA398103461B57E1FBEC6CDE64A28F8B02E08BE19F1C177665F67A4355724B82CC3FD73C41F597E1E2711C7CF07246DB2008401CE74977286A80657B29AEFA887210E0092324CA8024BBED18D75C5A67C68D77B80185CE36573E526946BFF7F4E04F051E0F5915FACA832E00308DD525070B7A4478174B003864E216CC1339BF5286FFC3D0E6CE06F9FCBA51295766D593DDBF97AE642FA132E939A9646048C45349D536068D2FC60E799C1EDC380BFCFCD3798FDF4FB34597B29C13AC8E0886E10D87D3C16CFEEAA632A09EAC1D8E17EC3C77B466A1C19859CAC7D7B7DD8A5350792E80A1C3CC0CC03E6B87F1876C3707BB99676DD0207B2BCD6C2A93C81F382B375E4B7AE516F4857549B46C50633D649079C099CAF65C2B6709659BEE1ADEEF3BE5C8D3A831FA0338EC42712F4B3D6DB8FB444141DC7D8E05DE617DE04323CFA2EA206A8ED2201891F6209BDA546392686F500F3155C24987A332FCABD03B7B98F814BBB323230D35C94D5D5C11CA2200DA0329234D6B04884EBE063781FDD32BB580C4BE4BB12F7D47A9DD7418BD6CED88417BCAC34179AF97958E637D75236B784C4241B3B6719902A1F98384DE7E77E0DD3E59129EE5D91EC233282825D3A8E56B60CEB33001DD89467E52C4A446973F391C76E34486E8C9DA625D752FB2837B187E27AC7854739598D545CF0EE61DCB53D07E41083D21EFA8CE8AAEFCC40C5BB919BE1B3A0FF7AAADC586301F183752F3CE8913974C38372BD194820EDFEE9A3ED40262A42759169; Hm_lpvt_c1832dca4922753f642109f295f07eba=1698710799; HWWAFSESTIME=1698710808400; HWWAFSESID=2f46597512c433fee4'
}
for i in range(1,1001):
    data={
      "pageNum": i,
      "pageSize": 10
    }
    res=requests.post(url=url,headers=headers,data=json.dumps(data))
    # # print(res.status_code)
    print(res.text)
    # object1=re.compile(r'"source":(?P<source>.*?),',re.S)
    # object2=re.compile(r'"tenderCode":"(?P<tenderCode>.*?)"',re.S)
    object3=re.compile(r'"area":"(?P<area>.*?)"',re.S)
    object4=re.compile(r'"completeTime":"(?P<completeTime>.*?)"',re.S)
    object5=re.compile(r'"publishDate":"(?P<publishDate>.*?)"',re.S)
    object6=re.compile(r'"tenderCompanyName":"(?P<tenderCompanyName>.*?)"',re.S)
    object7 = re.compile(r'"tenderName":"(?P<tenderName>.*?)"', re.S)


    f = open("data_yzw2.csv", mode='a+',newline='', encoding='utf-8')
    csvwriter = csv.writer(f)
    # result1=object1.finditer(res.text)
    # result2=object2.finditer(res.text)
    result3=object3.finditer(res.text)
    result4=object4.finditer(res.text)
    result5=object5.finditer(res.text)
    result6=object6.finditer(res.text)
    result7=object7.finditer(res.text)
    # for it in result1:
    #     dict1=it.groupdict()
    # for it in result2:
    #     dict2=it.groupdict()

    for it in result3:
        dict3=it.groupdict()
    for it in result4:
        dict4=it.groupdict()
    for it in result5:
        dict5=it.groupdict()
    for it in result6:
        dict6=it.groupdict()
    for it in result7:
        dict7=it.groupdict()
    dict={}
    # dict.update(dict1)
    # dict.update(dict2)
    dict.update(dict3)
    dict.update(dict4)
    dict.update(dict5)
    dict.update(dict6)
    dict.update(dict7)

    csvwriter.writerow(dict.values())
    f.close()
    print('over!')

data = pd.read_csv('data_yzw2.csv')

  • 写回答

2条回答 默认 最新

  • 二九筒 2023-10-31 11:27
    关注

    你传参就只传了分页的参数,其他参数都没传啊,而且接口还不一样?

    img

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

问题事件

  • 系统已结题 11月9日
  • 已采纳回答 11月1日
  • 创建了问题 10月31日

悬赏问题

  • ¥15 Opencv(C++)异常
  • ¥15 VScode上配置C语言环境
  • ¥15 汇编语言没有主程序吗?
  • ¥15 这个函数为什么会爆内存
  • ¥15 无法装系统,grub成了顽固拦路虎
  • ¥15 springboot aop 应用启动异常
  • ¥15 matlab有关债券凸性久期的代码
  • ¥15 lvgl v8.2定时器提前到来
  • ¥15 qtcp 发送数据时偶尔会遇到发送数据失败?用的MSVC编译器(标签-qt|关键词-tcp)
  • ¥15 cam_lidar_calibration报错