douzhang1926 2016-10-07 15:58
浏览 180

简单的HTML DOM获取用JS加载的动态内容

I'm trying to get a dynamically loaded content from a web page. Specifically the options loaded to a select. So if I do:

$options = $html->find('select[class=theSelectClass]')[0]->find('option');
foreach($options as $option){
     echo $option->text().'<br>';
}

This works as expected and my output is:

Select an option

Why? Because the other options are loaded with JS after the page loads. So my question is how can I get this dynamically loaded options inside the select?

This is my attempt using JS Ajax and another PHP page:

in my php that includes the simple_html_dom:

$html->load_file($base);
$var = '<script>

    var xhttp = new XMLHttpRequest();
    xhttp.onreadystatechange = function() {
        if (this.readyState == 4 && this.status == 200) {
           this.responseText;
        }
    };
    xhttp.open("GET", "http://localhost/crawler/ajax.php?param=HelloWorld", true);
    xhttp.send();

</script>';
$e = $html->find("body", 0);
$e->outertext = $e->makeup() . $e->innertext . $var . '</body>';

and my ajax.php file:

file_put_contents ( 'ajax.txt' , $_GET['param']);

I was trying to see if I could send an Ajax call from the html loaded file, but I feel far from being able to do it. So how can I make this happen?

Thank you

  • 写回答

1条回答 默认 最新

  • douyi4912 2016-10-07 16:06
    关注

    It might be easier for you to first use a headless browser to render the page then pass that to simple html dom. You could do this with CasperJS/PhantomJS or another tool that renders the page with javascript.

    `

    require("vendor/autoload.php");
    use Sunra\PhpSimple\HtmlDomParser;
    use Browser\Casper;
    $casper = new Casper();
    // forward options to phantomJS
    // for example to ignore ssl errors
    $casper->setOptions(array(
        'ignore-ssl-errors' => 'yes'
    ));
    $casper->start('https://www.reddit.com');
    $casper->wait(5000);
    $output = $casper->getOutput();
    $casper->run();
    $html = $casper->getHtml();
    $dom = HtmlDomParser::str_get_html( $html );
    $elems = $dom->find("a");
    foreach($elems as $e){
        print_r($e->href);
    }
    

    ?>`

    评论

报告相同问题?

悬赏问题

  • ¥60 版本过低apk如何修改可以兼容新的安卓系统
  • ¥25 由IPR导致的DRIVER_POWER_STATE_FAILURE蓝屏
  • ¥50 有数据,怎么建立模型求影响全要素生产率的因素
  • ¥50 有数据,怎么用matlab求全要素生产率
  • ¥15 TI的insta-spin例程
  • ¥15 完成下列问题完成下列问题
  • ¥15 C#算法问题, 不知道怎么处理这个数据的转换
  • ¥15 YoloV5 第三方库的版本对照问题
  • ¥15 请完成下列相关问题!
  • ¥15 drone 推送镜像时候 purge: true 推送完毕后没有删除对应的镜像,手动拷贝到服务器执行结果正确在样才能让指令自动执行成功删除对应镜像,如何解决?