简单的Html Dom刮痧一半的页面

I am trying to scrape this url https://nrg91.gr/nrg-airplay-chart/ using simple-html-dom, but it does not seem to get the full html source code. This code:

        include_once('simple_html_dom.php');
        $html = file_get_html('https://nrg91.gr/nrg-airplay-chart');

        echo $html->plaintext;

displays the content up to the h1, just before the content I am after. And from the simple-html-dom manual examples, this should display all links from that url:

        foreach($html->find('a') as $e) 
        echo $e->href . '<br>';

but it only displays the links up to the main navigation menu, not from the main body or footer.

I also tried using prerender.com, to fully load url before passing it to file_get_html but the result was the same. What am I doing wrong?

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

3条回答默认最新

donglou8371 2018-10-02 01:42

关注

Here's my super dirty approach to fetching the rank/artist/title/youtube data using both DOMDocument and SimpleXML.

The concept is to locate each "row" of data via the xpath //ul[@id="chart_ul"]/li, then using dom_import_simplexml( $outer )->getNodePath() to build a new xpath to select the individual elements where the desired data can be located.

$temp = sys_get_temp_dir() . DIRECTORY_SEPARATOR . 'nrg-airplay-chart.html';

if( file_exists( $temp ) === false or filemtime( $temp ) < time() - 3600 )
{
  file_put_contents( $temp, $html = file_get_contents('https://nrg91.gr/nrg-airplay-chart/') );
}
else
{
  $html = file_get_contents( $temp );
}

$dom = new DOMDocument();
$dom->loadHTML( $html );
$xml = simplexml_import_dom( $dom );
$array = array();

foreach( $xml->xpath('//ul[@id="chart_ul"]/li') as $index => $set )
{
  $basexpath = dom_import_simplexml( $set )->getNodePath();
  $array[] = array(
    'ranking' => (string) $xml->xpath( $basexpath . '//span[@id="ranking"]' )[0],
    'artist' => (string) $xml->xpath( $basexpath . '//p[@id="artist"]/b' )[0],
    'title' => (string) $xml->xpath( $basexpath . '//p[@id="title"]' )[0],
    'youtube' => (string) $xml->xpath( $basexpath . '//div[@id="media"]/a/@href' )[0],
  );
}

print_r( $array );

本回答被题主选为最佳回答 , 对您是否有帮助呢?

查看更多回答(2条)

报告相同问题？

关注问题

简单的Html Dom刮痧一半的页面 php
2018-10-02 00:42

回答 3 已采纳 Here's my super dirty approach to fetching the rank/artist/title/youtube data using both DOMDocume
PHP简单HTML DOM - 如何获取标记内的文本 html php
2016-04-02 09:04

回答 1 已采纳 try: innertext() innertext used for Read or write the inner HTML text of element. foreach($ht
PHP简单的HTML DOM解析器“字符问题 html php
2015-09-08 08:45

回答 1 已采纳 If i escape the characters, i lose them. But you can use addslashes() method for removing them. H
php抓取页面生成html,PHP smiple html dom抓取页面内容
2021-04-07 08:28

豌豆米米胡豆壳壳的博客之前做页面抓取，数据采集等功能的时候，第一个想到的就是用正则表达式去匹配页面内容。但是对于像我这种，正则只懂皮毛的人来说，写正则是真的很恶心的一件事。去网上找，也不一定能改成自己需要的正则。今天给大家...
PHP简单HTML DOM和下拉元素选择选项 html javascript jquery php
2015-01-16 09:33

回答 3 已采纳 With minimal research you can figure this out for yourself. $ret->children(2)->selected = t
简单的HTML DOM解析不起作用 html php
2016-03-11 23:28

回答 3 已采纳 You mix Simple HTML Dom third part class commands (as per your question title) with DOMDocument bu
使用PHP获取加载的操作HTML dom php
2018-08-17 21:57

回答 1 已采纳 You can't. At least, not with PHP alone. The PHP DOM extension does not include a Javascript inter
simple_html_dom.php
2013-05-30 14:41

可以通过这个php类来解析html文档，对其中的html元素进行操作 (PHP5+以上版本)。解析器不仅仅只是帮助我们验证html文档；更能解析不符合W3C标准的html文档。它使用了类似jQuery的元素选择器，通过元素的id，class，...
简单的html dom抓住h1标头 html php
2016-08-11 12:39

回答 1 已采纳 Yes you see that error because you are passing only one argument to the find function. $header_1
使用PHP Simple HTML DOM Parser提取HTML纯 html php
2016-09-25 15:51

回答 1 已采纳 $escapedHtmlChars = ""; $htmlElements = ""; $html = file_get_html('https://my.playstation.com/obai
PHP简单的HTML DOM php
2013-11-25 18:18

回答 1 已采纳 It is easy with DOMDocument, you can fetch the text content inside an element: $result1 = curl_ex
PHP Simple HTML DOM 简单使用
2018-11-19 21:46

程重吾的博客远程抓取页面内容后，利用 Simple HTML DOM生成dom结构，之后可以像jQuery 一样方便的操作dom &amp;amp;amp;amp;lt;?php $ch = curl_init();//初始化一个cURL会话 curl_setopt($ch ,CURLOPT_URL ,'...
简单的html dom过滤器表单获取名称和值为php数组 html php
2016-10-15 10:22

回答 1 已采纳 this should do the trick: <?php include_once('simple_html_dom.php'); $url = '<!DOCTYPE ht
php解析html类库simple_html_dom
2019-04-24 18:45

夏已微凉、的博客 php解析html类库simple_html_dom 工具类下载地址：https://github.com/samacs/simple_html_dom 转载地址：https://blog.csdn.net/j_h_s/article/details/78457675 ...
HTML-DOM树篇
2021-07-28 14:33

muzili_henji的博客 DOM 将web页面与脚本或编程语言链接起来，可以通过DOM提供的方法改变节点的结构、样式或者内容。二、为什么要介绍DOM树？ DOM树是浏览器标记内部的表示，浏览器接收到网络数据后的第一步就是处理HTML标记并构建...
没有解决我的问题, 去提问

悬赏问题

¥15 ETLCloud 处理json多层级问题
¥15 matlab中使用gurobi时报错
¥15 这个主板怎么能扩出一两个sata口
¥15 不是，这到底错哪儿了😭
¥15 2020长安杯与连接网探
¥15 关于#matlab#的问题：在模糊控制器中选出线路信息，在simulink中根据线路信息生成速度时间目标曲线（初速度为20m/s，15秒后减为0的速度时间图像）我想问线路信息是什么
¥15 banner广告展示设置多少时间不怎么会消耗用户价值
¥16 mybatis的代理对象无法通过@Autowired装填
¥15 可见光定位matlab仿真
¥15 arduino 四自由度机械臂

码龄粉丝数原力等级 --

简单的Html Dom刮痧一半的页面

3条回答默认最新

码龄粉丝数原力等级 --

悬赏问题

简单的Html Dom刮痧一半的页面

3条回答 默认 最新

悬赏问题

3条回答默认最新