duanpao9781 2018-10-02 00:42
浏览 57
已采纳

简单的Html Dom刮痧一半的页面

I am trying to scrape this url https://nrg91.gr/nrg-airplay-chart/ using simple-html-dom, but it does not seem to get the full html source code. This code:

        include_once('simple_html_dom.php');
        $html = file_get_html('https://nrg91.gr/nrg-airplay-chart');

        echo $html->plaintext;

displays the content up to the h1, just before the content I am after. And from the simple-html-dom manual examples, this should display all links from that url:

        foreach($html->find('a') as $e) 
        echo $e->href . '<br>';

but it only displays the links up to the main navigation menu, not from the main body or footer.

I also tried using prerender.com, to fully load url before passing it to file_get_html but the result was the same. What am I doing wrong?

  • 写回答

3条回答 默认 最新

  • donglou8371 2018-10-02 01:42
    关注

    Here's my super dirty approach to fetching the rank/artist/title/youtube data using both DOMDocument and SimpleXML.

    The concept is to locate each "row" of data via the xpath //ul[@id="chart_ul"]/li, then using dom_import_simplexml( $outer )->getNodePath() to build a new xpath to select the individual elements where the desired data can be located.

    $temp = sys_get_temp_dir() . DIRECTORY_SEPARATOR . 'nrg-airplay-chart.html';
    
    if( file_exists( $temp ) === false or filemtime( $temp ) < time() - 3600 )
    {
      file_put_contents( $temp, $html = file_get_contents('https://nrg91.gr/nrg-airplay-chart/') );
    }
    else
    {
      $html = file_get_contents( $temp );
    }
    
    $dom = new DOMDocument();
    $dom->loadHTML( $html );
    $xml = simplexml_import_dom( $dom );
    $array = array();
    
    foreach( $xml->xpath('//ul[@id="chart_ul"]/li') as $index => $set )
    {
      $basexpath = dom_import_simplexml( $set )->getNodePath();
      $array[] = array(
        'ranking' => (string) $xml->xpath( $basexpath . '//span[@id="ranking"]' )[0],
        'artist' => (string) $xml->xpath( $basexpath . '//p[@id="artist"]/b' )[0],
        'title' => (string) $xml->xpath( $basexpath . '//p[@id="title"]' )[0],
        'youtube' => (string) $xml->xpath( $basexpath . '//div[@id="media"]/a/@href' )[0],
      );
    }
    
    print_r( $array );
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 微信小程序协议怎么写
  • ¥15 c语言怎么用printf(“\b \b”)与getch()实现黑框里写入与删除?
  • ¥20 怎么用dlib库的算法识别小麦病虫害
  • ¥15 华为ensp模拟器中S5700交换机在配置过程中老是反复重启
  • ¥15 java写代码遇到问题,求帮助
  • ¥15 uniapp uview http 如何实现统一的请求异常信息提示?
  • ¥15 有了解d3和topogram.js库的吗?有偿请教
  • ¥100 任意维数的K均值聚类
  • ¥15 stamps做sbas-insar,时序沉降图怎么画
  • ¥15 买了个传感器,根据商家发的代码和步骤使用但是代码报错了不会改,有没有人可以看看