douji3623 2017-10-19 16:10
浏览 53

抓取整个网页+ CSS + JavaScript [关闭]

I'm trying to create a webpage version control backup / log. Where if the webpage (including JS and CSS) gets altered it saves a static copy on the drive.

How do I get the CSS and javascript of a webpage? Getting the HTML is easy by simply connecting to the webpage and read the contents and return it. But how do I get the CSS & Javascript of this page too?

The system doesnt have direct access to the webserver(s) so I have to do everything over the network remotely.

My idea is I search the HTML I scraped for .css and '.js' and take everything until the first quote " and directly access the CSS / javascript file as webpage. But I think this might not be very reliable?

Not sure why this is marked as too broad. I'm asking how to get the CSS and javascript of a webpage. I reformed my question, hopefully its better now.

  • 写回答

1条回答 默认 最新

  • doumi5223 2017-10-19 16:14
    关注

    Instead of searching for .js and .css , I'd look for <script> and <link> tags instead and use their src and href properties respectively to perform another network request and retrieve those files for comparison.

    This will be more reliable because you won't have to worry about the page's content containing js or css, and you could also use an XML parser to ensure things like single-quotes vs. double aren't an issue.

    评论

报告相同问题?

悬赏问题

  • ¥15 matlab数字图像处理频率域滤波
  • ¥15 在abaqus做了二维正交切削模型,给刀具添加了超声振动条件后输出切削力为什么比普通切削增大这么多
  • ¥15 ELGamal和paillier计算效率谁快?
  • ¥15 file converter 转换格式失败 报错 Error marking filters as finished,如何解决?
  • ¥15 ubuntu系统下挂载磁盘上执行./提示权限不够
  • ¥15 Arcgis相交分析无法绘制一个或多个图形
  • ¥15 关于#r语言#的问题:差异分析前数据准备,报错Error in data[, sampleName1] : subscript out of bounds请问怎么解决呀以下是全部代码:
  • ¥15 seatunnel-web使用SQL组件时候后台报错,无法找到表格
  • ¥15 fpga自动售货机数码管(相关搜索:数字时钟)
  • ¥15 用前端向数据库插入数据,通过debug发现数据能走到后端,但是放行之后就会提示错误