Crawling: How to handle href=javascript:return;?

I am currently crawling for the docs on the EU's public webpage: https://op.europa.eu/en/publication-detail/-/publication/c2c32dd3-f83c-11ec-b94a-01aa75ed71a1/language-en/format-PDF/source-264104800
?%ra=link
I wonder how I can get the URL of the HTML version of the document since the href here is not an address, but "jacascript:return;". https://i.stack.imgur.com/hhVEI.jpg

Is there any way to get the original link for this .html doc? Or how can I activate this HTML download icon by using a Python crawler?

Many thanks.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
honestman_ 2022-08-22 13:26
关注
Look for the real url right here

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

确认短信上的电话预览URL：生成假验证/ PHP php
2019-06-18 09:20

回答 1 已采纳 Thanks, cookie trick works : in wp function : function set_new_cookie_conf() { setcookie
PHP - file_get_contents无法打开流：连接已关闭？ php
2015-06-17 11:22

回答 3 已采纳 Try this way... function url_get_contents ($Url) { if (!function_exists('curl_init')){
如何使用php抓取基于javascript和ajax的网页数据 ajax javascript php
2014-12-18 09:39

回答 1 已采纳 PHP doesn't render JS, so you can't do what you are asking. But, that page is making a request wh
前端前端开发工程师_我们庞大的工程师团队会使用此前端开发指南
2020-07-16 13:26

cumian9828的博客 SPAs are reliant on JavaScript to render content, but not all search engines execute JavaScript during crawling, and they may see empty content on your page. This inadvertently hurts the SEO of your ...
“围棋之旅”网络爬虫练习中的频道说明
2017-09-11 03:49

回答 1 已采纳 The first for loop schedules multiple goroutines to run and is iterating over a slice of urls. Th
PHP之上的新语言？ php
2011-01-15 15:09

回答 12 已采纳 The idea is definitely not stupid, especially if executed well. I like coffeescript a lot, but it
Golang标志被解释为第一个os.Args参数
2013-10-27 10:51

回答 4 已采纳 os.Args doesn't really know anything about the flag package and contains all command-line argument
全网最详细中英文ChatGPT接口文档（五）30分钟快速入门ChatGPT——手把手示例教程：如何建立一个人工智能回答关于您的网站问题，小白也可学
2023-03-20 12:55

小胡说人工智能的博客 ChatGPT是一种基于GPT-4语言模型的人工智能聊天机器人，能够与人进行自然的对话，并提供令人惊讶的人性化的回答。本文将手把手地教你如何建立一个ChatGPT机器人，让它能够回答关于你的网站的问题，并提供服务。本文...
Python类里的装饰器 python
2022-10-05 18:31

回答 1 已采纳把修饰器提到最外层去，然后修饰器的 self 直接从 *args 里面去拿。 import functools def is_crawling(func): @functools.wraps
golang：带有select的goroute不会停止，除非我添加了fmt.Print（）
2012-09-27 07:09

回答 2 已采纳 Putting a default statement in your select changes the way select works. Without a default statem
试图使用PHP解析网页 html javascript php
2013-11-12 03:51

回答 1 已采纳 Try using file_get_content instead of get HTML and see if that works. Honestly, depending on your
Useful tips to scrapy web pages with Python(Request)
2017-11-07 12:05

liukuan73的博客 http://www.thecodeknight.com/post_categories/search/posts/scrapy_python... Scrapy is an awesome Open Source tool to scrapy pages using Python. Why it's so awesome ? First, because its interface is
网站数据库不自动更新[重复] database mysql php
2019-01-11 12:41

回答 2 已采纳 Set cron job and run it everyday one time and check current date equals or greater than date of a
LLM应用实现个人知识问答
2023-02-15 23:34

爱吃鱼的小王同学的博客最近ChatGpt太火热了，赶紧来了解一波相关情况…目前来说ChatGpt只有2021年之前的知识，如果想...GPT是“Generative Pre-trained Transformer”的缩写，是一种基于Transformer模型的预训练语言模型，由OpenAI公司开发。
如何使用javascript从网站抓取数据
2020-08-30 10:10

weixin_26729763的博客 The process of collecting information from a website (or websites) is often referred to as either web scraping or web crawling. Web scraping is the process of scanning a webpage/websi...
rxjs 怎么使用_如何使用RxJS和Node构建简单且可定制的Web爬虫
2020-07-29 18:35

cumian8165的博客如何使用RxJS和Node构建简单且可定制的Web爬虫 (How to build a simple & customizable web scraper using RxJS and Node) 介绍 (Introduction) After getting to know RxJS (thanks to Angular!),...
DOM访问优化
2020-09-26 15:23

cunbei2644的博客 Use selectors API where available instead of crawling the DOM yourself (upgrade your JavaScript library if it's not taking advantage of the selectors API). Be careful with HTML Collections. 如何加快...
react导入组件_模拟导入的React组件和各种助手
2020-09-01 23:20

weixin_26723981的博客 react导入组件I know the documentation on Jest provides all the information needed when it comes to mocking imports of a test React components. Yet I sometimes find myself crawling through various ...
simhash长文本查重算法原理与实战
2024-03-02 12:32

忆世界的博客 Detecting Near-Duplicates for Web Crawling 论文中所说），。simhash也有其局限性，在处理小于500字的短文本时，simhash的表现并不是很好，所以在使用simhash前一定要注意这个细节。参考文献。
Go 语言相关的优秀框架，库及软件列表
2019-10-04 06:38

aa110212312的博客 If you see a package or project here that is no longer maintained or is not a good fit, please submit a pull request to improve this file. Thank you! Contents Awesome Go Audio and Music ...
没有解决我的问题, 去提问

问题事件

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
创建了问题 8月22日

悬赏问题

¥15 pcl运行在qt msvc2019环境运行效率低于visual studio 2019
¥15 MAUI,Zxing扫码，华为手机没反应。可提高悬赏
¥15 python运行报错 ModuleNotFoundError: No module named 'torch'
¥100 华为手机私有App后台保活
¥15 sqlserver中加密的密码字段查询问题
¥20 有谁能看看我coe文件到底哪儿有问题吗？
¥20 我的这个coe文件到底哪儿出问题了
¥15 matlab使用自定义函数时一直报错输入参数过多
¥15 设计一个温度闭环控制系统
¥100 rtmpose姿态评估

Crawling: How to handle href=javascript:return;?

1条回答 默认 最新

问题事件

悬赏问题

1条回答默认最新