douji3623 2017-10-19 16:10
浏览 53

抓取整个网页+ CSS + JavaScript [关闭]

I'm trying to create a webpage version control backup / log. Where if the webpage (including JS and CSS) gets altered it saves a static copy on the drive.

How do I get the CSS and javascript of a webpage? Getting the HTML is easy by simply connecting to the webpage and read the contents and return it. But how do I get the CSS & Javascript of this page too?

The system doesnt have direct access to the webserver(s) so I have to do everything over the network remotely.

My idea is I search the HTML I scraped for .css and '.js' and take everything until the first quote " and directly access the CSS / javascript file as webpage. But I think this might not be very reliable?

Not sure why this is marked as too broad. I'm asking how to get the CSS and javascript of a webpage. I reformed my question, hopefully its better now.

  • 写回答

1条回答 默认 最新

  • doumi5223 2017-10-19 16:14
    关注

    Instead of searching for .js and .css , I'd look for <script> and <link> tags instead and use their src and href properties respectively to perform another network request and retrieve those files for comparison.

    This will be more reliable because you won't have to worry about the page's content containing js or css, and you could also use an XML parser to ensure things like single-quotes vs. double aren't an issue.

    评论

报告相同问题?

悬赏问题

  • ¥15 乘性高斯噪声在深度学习网络中的应用
  • ¥15 运筹学排序问题中的在线排序
  • ¥15 关于docker部署flink集成hadoop的yarn,请教个问题 flink启动yarn-session.sh连不上hadoop,这个整了好几天一直不行,求帮忙看一下怎么解决
  • ¥30 求一段fortran代码用IVF编译运行的结果
  • ¥15 深度学习根据CNN网络模型,搭建BP模型并训练MNIST数据集
  • ¥15 C++ 头文件/宏冲突问题解决
  • ¥15 用comsol模拟大气湍流通过底部加热(温度不同)的腔体
  • ¥50 安卓adb backup备份子用户应用数据失败
  • ¥20 有人能用聚类分析帮我分析一下文本内容嘛
  • ¥30 python代码,帮调试,帮帮忙吧