Zhang_lo 2023-09-02 18:57 采纳率: 50%
浏览 12
已结题

爬取豆瓣阅读页面代码返回出现问题

我爬取豆瓣阅读其中一个子网页,返回的内容不太正常,跟页面源码不同,进行不了页面解析,请问这个是怎么回事呢。
hash_url = 'https://read.douban.com/category/1?sort=hot&page=1%27
resp = requests.get(hash_url, headers=headers)
print(resp.text)
打印出现
Ark.kindTree = [{"children": [{"children": [], "id": 501, "name": "\u8a00\u60c5\u5c0f\u8bf4"}, {"children": [], "id": 532, "name": "\u5973\u6027\u5c0f\u8bf4"}, {"children": [], "id": 508, "name": "\u60ac\u7591\u5c0f\u8bf4"}, {"children": [], "id": 506, "name": "\u5e7b\u60f3\u5c0f\u8bf4"}, {"children": [], "id": 505, "name": "\u79d1\u5e7b\u5c0f\u8bf4"},
等内容,请问是网站反爬策略吗

img

  • 写回答

2条回答 默认 最新

  • cjh4312 2023-09-02 19:26
    关注

    需要这个吗

    
    import requests
    import pandas as pd
    headers = {
    
        'Referer': 'https://read.douban.com/category/1?sort=hot&page=1%27',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36',
    }
    
    json_data = {
        'sort': 'hot',
        'page': 1,
        'kind': 1,
        'query': '\n    query getFilterWorksList($works_ids: [ID!]) {\n      worksList(worksIds: $works_ids) {\n        \n    \n    title\n    cover(useSmall: false)\n    url\n    isBundle\n    coverLabel(preferVip: true)\n  \n    \n  url\n  title\n\n    \n  author {\n    name\n    url\n  }\n  origAuthor {\n    name\n    url\n  }\n  translator {\n    name\n    url\n  }\n\n    \n  abstract\n  authorHighlight\n  editorHighlight\n\n    \n    isOrigin\n    kinds {\n      \n    name @skip(if: true)\n    shortName @include(if: true)\n    id\n  \n    }\n    ... on WorksBase @include(if: true) {\n      wordCount\n      wordCountUnit\n    }\n    ... on WorksBase @include(if: false) {\n      inLibraryCount\n    }\n    ... on WorksBase @include(if: false) {\n      \n    isEssay\n    \n    ... on EssayWorks {\n      favorCount\n    }\n  \n    \n    \n    averageRating\n    ratingCount\n    url\n    isColumn\n    isFinished\n  \n  \n  \n    }\n    ... on EbookWorks @include(if: false) {\n      \n    ... on EbookWorks {\n      book {\n        url\n        averageRating\n        ratingCount\n      }\n    }\n  \n    }\n    ... on WorksBase @include(if: false) {\n      isColumn\n      isEssay\n      onSaleTime\n      ... on ColumnWorks {\n        updateTime\n      }\n    }\n    ... on WorksBase @include(if: true) {\n      isColumn\n      ... on ColumnWorks {\n        isFinished\n      }\n    }\n    ... on EssayWorks {\n      essayActivityData {\n        \n    title\n    uri\n    tag {\n      name\n      color\n      background\n      icon2x\n      icon3x\n      iconSize {\n        height\n      }\n      iconPosition {\n        x y\n      }\n    }\n  \n      }\n    }\n    highlightTags {\n      name\n    }\n    ... on WorksBase @include(if: false) {\n      fanfiction {\n        tags {\n          id\n          name\n          url\n        }\n      }\n    }\n  \n    \n  ... on WorksBase {\n    copyrightInfo {\n      newlyAdapted\n      newlyPublished\n      adaptedName\n      publishedName\n    }\n  }\n\n    isInLibrary\n    ... on WorksBase @include(if: false) {\n      \n    fixedPrice\n    salesPrice\n    isRebate\n  \n    }\n    ... on EbookWorks {\n      \n    fixedPrice\n    salesPrice\n    isRebate\n  \n    }\n    ... on WorksBase @include(if: true) {\n      ... on EbookWorks {\n        id\n        isPurchased\n        isInWishlist\n      }\n    }\n    ... on WorksBase @include(if: false) {\n      fanfiction {\n        fandoms {\n          title\n          url\n        }\n      }\n    }\n    ... on WorksBase @include(if: false) {\n      fanfiction {\n        kudoCount\n      }\n    }\n  \n        id\n        isOrigin\n        isEssay\n      }\n    }\n  ',
        'variables': {},
    }
    
    response = requests.post('https://read.douban.com/j/kind/', headers=headers, json=json_data)
    data=pd.DataFrame(response.json()['list'])
    

    img

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

问题事件

  • 系统已结题 11月29日
  • 已采纳回答 11月21日
  • 创建了问题 9月2日

悬赏问题

  • ¥30 arduino vector defined in discarded section `.text' of wiring.c.o (symbol from plugin)
  • ¥20 关于#c++#的问题:(2)运算二叉树·表达式一般由一个运算符和两个操作数组成:(相关搜索:二叉树遍历)
  • ¥20 如何训练大模型在复杂因素组成的系统中求得最优解
  • ¥15 关于#r语言#的问题:在进行倾向性评分匹配时,使用“match it"包提示”错误于eval(family$initialize): y值必需满足0 <= y <= 1“请问在进行PSM时
  • ¥45 求17位带符号原码乘法器verilog代码
  • ¥20 PySide6扩展QLable实现Word一样的图片裁剪框
  • ¥15 matlab数据降噪处理,提高数据的可信度,确保峰值信号的不损失?
  • ¥15 怎么看我在bios每次修改的日志
  • ¥15 python+mysql图书管理系统
  • ¥15 Questasim Error: (vcom-13)