dshmvqnl98119 2016-05-09 22:41
浏览 100

在执行JavaScript后从dom获取html

I may be saying this with incorrect terminology so correct me if I'm wrong please.

Here's what I want to do: I'm trying to scrape a website's comments section but the comments are loaded via an ajax call after the page has fully loaded. When I try to scrape the HTML from the site via:

res, err:= http.Get(url)
if err != nil {
    // handle error
}
defer res.Body.Close()

But it obviously gets the html before the ajax call. How do I go about getting the html after the ajax call?

This is completely off the top of my head, but would I need to basically create a js-renderer in code for this? My guess is that the JS needs to execute somehow. Any suggestions / libraries / examples on how to go about this? I'd prefer this to be in go but it could be realistically in any language.

  • 写回答

3条回答 默认 最新

  • duanpanyang1962 2016-05-09 22:56
    关注

    If you own the site or can easily determine (or generate) the URI of the call that loads the comments, it's probably easier to make that same AJAX call yourself rather than bother with DOM parsing or arbitrary JS execution.

    At that point Go would actually be a good language to use, since its JSON and XML standard libraries are excellent for unmarshalling that kind of data.

    评论

报告相同问题?

悬赏问题

  • ¥15 虚幻5 UE美术毛发渲染
  • ¥15 CVRP 图论 物流运输优化
  • ¥15 Tableau online 嵌入ppt失败
  • ¥100 支付宝网页转账系统不识别账号
  • ¥15 基于单片机的靶位控制系统
  • ¥15 真我手机蓝牙传输进度消息被关闭了,怎么打开?(关键词-消息通知)
  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度
  • ¥30 关于#r语言#的问题:如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测