YourSaDaddy 2022-08-22 12:01 采纳率: 0%
浏览 7

Crawling: How to handle href=javascript:return;?

I am currently crawling for the docs on the EU's public webpage: https://op.europa.eu/en/publication-detail/-/publication/c2c32dd3-f83c-11ec-b94a-01aa75ed71a1/language-en/format-PDF/source-264104800
?%ra=link
I wonder how I can get the URL of the HTML version of the document since the href here is not an address, but "jacascript:return;". https://i.stack.imgur.com/hhVEI.jpg

Is there any way to get the original link for this .html doc? Or how can I activate this HTML download icon by using a Python crawler?

Many thanks.

  • 写回答

1条回答 默认 最新

  • honestman_ 2022-08-22 13:26
    关注

    Look for the real url right here

    img

    评论

报告相同问题?

问题事件

  • 创建了问题 8月22日

悬赏问题

  • ¥15 pcl运行在qt msvc2019环境运行效率低于visual studio 2019
  • ¥15 MAUI,Zxing扫码,华为手机没反应。可提高悬赏
  • ¥15 python运行报错 ModuleNotFoundError: No module named 'torch'
  • ¥100 华为手机私有App后台保活
  • ¥15 sqlserver中加密的密码字段查询问题
  • ¥20 有谁能看看我coe文件到底哪儿有问题吗?
  • ¥20 我的这个coe文件到底哪儿出问题了
  • ¥15 matlab使用自定义函数时一直报错输入参数过多
  • ¥15 设计一个温度闭环控制系统
  • ¥100 rtmpose姿态评估