dpkrh2444 2011-02-25 21:08
浏览 32
已采纳

PHP DOM Parser仅适用于某些页面

I'm using: http://simplehtmldom.sourceforge.net/ and noticed that in the examples, and trying to scrape certain sites, only some of them return results.

I'm using:

include_once('../../simple_html_dom.php');

// Create DOM from URL or file
$website = 'http://www.digg.com/';
$html = file_get_html($website);

// Find all images 
foreach($html->find('img') as $element) 
   echo "<img src=\"" . $website . $element->src . "\"" . '<br>';

Which shows a bunch of thumbnails, but they are pretty much blank (and it's not returning all thumbnails).

Is it because they have some sort of htaccess restrictions on people? This happens for multiple websites.

  • 写回答

1条回答 默认 最新

  • douchuang1852 2011-02-25 22:16
    关注

    You're assuming that $element->src is always relative to $website which it could easily not be...

    For example: $element->src could already be http://www.digg.com/image.jpg so then doing $website . $element->src would be http://www.digg.com/http://www.digg.com/image.jpg and that wouldn't work...

    Try

    include_once('../../simple_html_dom.php');
    
    // Create DOM from URL or file
    $website = 'http://www.digg.com/';
    $html = file_get_html($website);
    
    // Find all images 
    foreach($html->find('img') as $element) {
       //dont want double slashes
       $src = ltrim($element->src, '/');
       //dont want double urls
       $src = str_replace($website, "", $src);
    
       echo "<img src=\"" . $website . $src . "\"" . '<br>';
    }
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
  • ¥15 CSAPPattacklab
  • ¥15 一直显示正在等待HID—ISP
  • ¥15 Python turtle 画图
  • ¥15 关于大棚监测的pcb板设计
  • ¥15 stm32开发clion时遇到的编译问题
  • ¥15 lna设计 源简并电感型共源放大器
  • ¥15 如何用Labview在myRIO上做LCD显示?(语言-开发语言)
  • ¥15 Vue3地图和异步函数使用
  • ¥15 C++ yoloV5改写遇到的问题