dpkrh2444 2011-02-25 21:08
浏览 32
已采纳

PHP DOM Parser仅适用于某些页面

I'm using: http://simplehtmldom.sourceforge.net/ and noticed that in the examples, and trying to scrape certain sites, only some of them return results.

I'm using:

include_once('../../simple_html_dom.php');

// Create DOM from URL or file
$website = 'http://www.digg.com/';
$html = file_get_html($website);

// Find all images 
foreach($html->find('img') as $element) 
   echo "<img src=\"" . $website . $element->src . "\"" . '<br>';

Which shows a bunch of thumbnails, but they are pretty much blank (and it's not returning all thumbnails).

Is it because they have some sort of htaccess restrictions on people? This happens for multiple websites.

  • 写回答

1条回答 默认 最新

  • douchuang1852 2011-02-25 22:16
    关注

    You're assuming that $element->src is always relative to $website which it could easily not be...

    For example: $element->src could already be http://www.digg.com/image.jpg so then doing $website . $element->src would be http://www.digg.com/http://www.digg.com/image.jpg and that wouldn't work...

    Try

    include_once('../../simple_html_dom.php');
    
    // Create DOM from URL or file
    $website = 'http://www.digg.com/';
    $html = file_get_html($website);
    
    // Find all images 
    foreach($html->find('img') as $element) {
       //dont want double slashes
       $src = ltrim($element->src, '/');
       //dont want double urls
       $src = str_replace($website, "", $src);
    
       echo "<img src=\"" . $website . $src . "\"" . '<br>';
    }
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥50 如何用脚本实现输入法的热键设置
  • ¥20 我想使用一些网络协议或者部分协议也行,主要想实现类似于traceroute的一定步长内的路由拓扑功能
  • ¥30 深度学习,前后端连接
  • ¥15 孟德尔随机化结果不一致
  • ¥15 apm2.8飞控罗盘bad health,加速度计校准失败
  • ¥15 求解O-S方程的特征值问题给出边界层布拉休斯平行流的中性曲线
  • ¥15 谁有desed数据集呀
  • ¥20 手写数字识别运行c仿真时,程序报错错误代码sim211-100
  • ¥15 关于#hadoop#的问题
  • ¥15 (标签-Python|关键词-socket)