抓取整个网页+ CSS + JavaScript [关闭]

I'm trying to create a webpage version control backup / log. Where if the webpage (including JS and CSS) gets altered it saves a static copy on the drive.

How do I get the CSS and javascript of a webpage? Getting the HTML is easy by simply connecting to the webpage and read the contents and return it. But how do I get the CSS & Javascript of this page too?

The system doesnt have direct access to the webserver(s) so I have to do everything over the network remotely.

My idea is I search the HTML I scraped for .css and '.js' and take everything until the first quote " and directly access the CSS / javascript file as webpage. But I think this might not be very reliable?

Not sure why this is marked as too broad. I'm asking how to get the CSS and javascript of a webpage. I reformed my question, hopefully its better now.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
doumi5223 2017-10-19 16:14
关注
Instead of searching for .js and .css , I'd look for <script> and <link> tags instead and use their src and href properties respectively to perform another network request and retrieve those files for comparison.

This will be more reliable because you won't have to worry about the page's content containing js or css, and you could also use an XML parser to ensure things like single-quotes vs. double aren't an issue.

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

如何使用php抓取基于javascript和ajax的网页数据 ajax javascript php
2014-12-18 09:39

回答 1 已采纳 PHP doesn't render JS, so you can't do what you are asking. But, that page is making a request wh
抓取aqi网页数据出错显示403 爬虫
2023-02-19 11:13

回答 1 已采纳这个问题通常是因为请求被服务器禁止或者无权访问而导致的。HTTP响应状态码403表示服务器理解客户端的请求，但是服务器拒绝执行该请求。一种可能的原因是你的请求缺少必要的身份验证信息，例如 API k
windows10系统使用spy++抓取窗口消息什么都抓不到 c++
2022-04-19 15:53

回答 2 已采纳问题已解决，谢谢各位
HTML+CSS+JavaScript基础知识总结
2022-05-27 13:38

ChocolateBar~的博客二、JavaScript 1. 基本数据类型 2. 数组的使用一、数组序列化二、栈和队列方法三、排列方法四、操作方法五、位置方法六、迭代方法 2.1 利用函数求任意个数的最大值 2.2 利用函数翻转任意数组 ...
JavaScript函数同步抓取HTML和JS javascript php
2016-07-23 16:39

回答 1 已采纳 There is no Javascript environment where it is recommended to use synchronous networking to retrie
java网页抓取问题 java
2012-06-21 16:42

回答 4 已采纳 [color=blue][b]这里是使用HttpClient和nekohtml的完整实现，能够完整抓取出来运输进程一览：[/b][/color] [code="java"] public cl
scrapy + selenium抓取到的网易云页面不完整 python 开发语言
2020-08-10 15:06

回答 1 已采纳 https://blog.csdn.net/lovemenghaibin/article/details/83111374
前端复习HTML+CSS+JavaScript（必问面试题）
2022-04-14 14:43

我要当前端工程师的博客前端复习 HTML 常见的几种图片格式以及他们之间的区别是什么？ JPG：支持有损压缩、不支持透明、不支持动画、色彩还原度较好 PNG：不支持压缩、支持透明、半透明、不透明、不支持动画 GIF：支持有损压缩、不支持全...
python+selenium+xpath如何定位网页table表格中的数据 python selenium 有问必答爬虫
2022-02-25 12:44

回答 2 已采纳使用last()定位最后一个tr节点，再用索引获取。示例： from lxml import etree with open('a.html','r',encoding='utf-8') as f:
PHP：获取HTML网页的所有CSS文件 css html php
2013-09-11 17:53

回答 2 已采纳 You could try using http://simplehtmldom.sourceforge.net/ for HTML parsing. require_once 'SimpleH
selenium + phantomjs +python网络抓取问题 python selenium
2018-05-08 01:30

回答 2 已采纳 ``` from selenium import webdriver d= webdriver.PhantomJS() d.set_page_load_timeout(10)
css3 wshtml_HTML5+CSS3+JavaScript前端开发基础
2020-12-23 12:02

网红教父的博客此页面内容由计算机程序自动抓取自第三方公开免费站点，以非人工方式自动生成，只作交流和学习使用，本站不储存、复制、传播、编辑、整理、推荐任何资源文件，亦不提供下载服务。如需下载，需先跳转至第三方站点，其...
如何用phyton抓取网页信息 python 有问必答自动化运维
2022-03-28 12:23

回答 3 已采纳使用selenium模拟浏览器操作和获取网页信息，处理提示、警告和确认框等，参考代码： from selenium import webdriver driver = webdriver.Chrom
html+css基础知识
2022-05-10 14:25

GGYY__的博客 HTML + CSS + Javascript = 网页 HTML：Hyper Text Markup Language 超文本标记语言；定义网页中有什么。 CSS：Cascading Style Sheets 层叠样式表；定义网页中的东西长什么样子。 1.1 执行HTML CSS HTML、CSS ->...
前端面试题全集(html+CSS)
2020-04-22 07:22

Dev _的博客一个高度自适应的 div，里面有两个 div，一个高度 100px，希望另一个填满剩下的高度 CSS 中类 class 和 id 的区别如何优化网页的打印样式请问为何要使用 transform 而非 absolute positioning，或反之的理由？...
没有解决我的问题, 去提问

悬赏问题

¥15 乘性高斯噪声在深度学习网络中的应用
¥15 运筹学排序问题中的在线排序
¥15 关于docker部署flink集成hadoop的yarn，请教个问题 flink启动yarn-session.sh连不上hadoop，这个整了好几天一直不行，求帮忙看一下怎么解决
¥30 求一段fortran代码用IVF编译运行的结果
¥15 深度学习根据CNN网络模型，搭建BP模型并训练MNIST数据集
¥15 C++ 头文件/宏冲突问题解决
¥15 用comsol模拟大气湍流通过底部加热（温度不同）的腔体
¥50 安卓adb backup备份子用户应用数据失败
¥20 有人能用聚类分析帮我分析一下文本内容嘛
¥30 python代码，帮调试，帮帮忙吧

抓取整个网页+ CSS + JavaScript [关闭]

1条回答 默认 最新

悬赏问题

1条回答默认最新