weixin_33721344 2016-02-01 17:33 采纳率: 0%
浏览 224

使用Node.js加载动态HTML

I'm pretty new with NodeJs.
I'm trying to download some html from a website in order to parse it and present some information for debug.
I try with success with http module (see this post), but in this way when I print chunk:

var req = http.request(options, function(res) {
    res.setEncoding("utf8");
    res.on("data", function (chunk) {
       console.log(chunk);
    });
});

I don't get all html that is loaded dynamically with ajax for instance:

<div class="container">
  ::before
      <div class="row">
        ::before
....
</div>

Are there any other module that can help me on this goal?

Thanks!

update

I would like to share with you my success (thanks to @oKonyk).

  • npm install phantomjs
  • create your script
  • use the same code suggested by @oKonyk

note that if you're running your script locally, you need to set this options:

options = { 'web-security': 'no' };
phantom.create({parameters: options}, function() {});
  • 写回答

1条回答 默认 最新

  • local-host 2016-02-01 18:31
    关注

    In order to capture dynamically built pages you have to render them in browser. There are several options to do that with node.js.

    I would suggest using phantomjs, which is a so called headless browser.

    In order to proof the concept you can install npm install phantomjs -g globally. Create test script 'google.js' with following content:

    var page = require('webpage').create();
    console.log('The default user agent is ' + page.settings.userAgent);
    page.settings.userAgent = 'SpecialAgent';
    page.open('http://www.google.org', function(status) {
      if (status !== 'success') {
        console.log('Unable to access network');
      } else {
        var html = page.evaluate(function() {
          return document.getElementsByTagName('html')[0].innerHTML;
        });
        console.log(html);
      }
      phantom.exit();
    });
    

    Then run it as phantomjs google.js

    You will get printed whole DOM of the page (at lest everything within <html> tags), which different from raw response that you are getting with http module.

    Later you can use phantom within your node project (more info here).

    评论

报告相同问题?

悬赏问题

  • ¥20 西门子S7-Graph,S7-300,梯形图
  • ¥50 用易语言http 访问不了网页
  • ¥50 safari浏览器fetch提交数据后数据丢失问题
  • ¥15 matlab不知道怎么改,求解答!!
  • ¥15 永磁直线电机的电流环pi调不出来
  • ¥15 用stata实现聚类的代码
  • ¥15 请问paddlehub能支持移动端开发吗?在Android studio上该如何部署?
  • ¥20 docker里部署springboot项目,访问不到扬声器
  • ¥15 netty整合springboot之后自动重连失效
  • ¥15 悬赏!微信开发者工具报错,求帮改