duandangqin0559 2014-08-22 15:18
浏览 60
已采纳

使用PHP DomDocument提取文本和图像src

I'm trying to extract img src and the text of the TDs inside the div id="Ajax" but i'm unable to extract the img with my code. It just ignores the img src. How can i extract also the img src and add it in the array?

HTML:

<div id="Ajax">
<table cellpadding="1" cellspacing="0">
<tbody>
<tr id="comment_1">
<td>20:28</td>
<td class="color">
</td>
<td class="last_comment">
Text<br/>
</td>
</tr>
<tr id="comment_2">
<td>20:25</td>
<td class="color">
</td>
<td class="comment">
Text 2<br/>
</td>
</tr>
<tr id="comment_3">
<td>20:24</td>
<td class="color">
<img src="http://url.ext/img/image02.jpeg" alt="img alt 2"/>
</td>
<td class="comment">
Text 3<br/>
</td>
</tr>
<tr id="comment_4">
<td>20:23</td>
<td class="color">
<img src="http://url.ext/img/image01.jpeg" alt="img alt"/>
</td>
<td class="comment">
Text 4<br/>
</td>
</tr>
</div>

PHP:

$html = file_get_contents($url);

$doc = new DOMDocument();
@$doc->loadHTML($html);
$contentArray = array();
$doc = $doc->getElementById('Ajax');
$text = $doc->getElementsByTagName ('td');
foreach ($text as $t)
{
$contentArray[] = $t->nodeValue;
}
print_r ($contentArray);

Thanks.

  • 写回答

1条回答 默认 最新

  • dtnqbre7980007 2014-08-22 17:46
    关注

    You're using $t->nodeValue to obtain the content of a node. An <img> tag is empty, thus has nothing to return. The easiest way to get the src attribute would be XPath.

    Example:

    $html = file_get_contents($url);
    
    $doc = new DOMDocument();
    @$doc->loadHTML($html);
    $xpath = new DOMXpath($doc);
    $expression = "//div[@id='Ajax']//tr"; 
    $nodes = $xpath->query($expression); // Get all rows (tr) in the div
    
    $imgSrcExpression = ".//img/@src";
    $firstTdExpression = "./td[1]";
    foreach($nodes as $node){ // loop over each row
      // select the first td node
      $tdNodes = $xpath->query($firstTdExpression ,$node);
      $tdVal = null;
      if($tdNodes->length > 0){
        $tdVal = $tdNodes->item(0)->nodeValue;
      }
    
      // select the src attribute of the img node
      $imgNodes = $xpath->query($imgSrcExpression,$node);
      $imgVal = null;
      if($imgNodes ->length > 0){
        $imgVal = $imgNodes->item(0)->nodeValue;
      }
    }
    

    (Caution: Code may contain typos)

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 乌班图ip地址配置及远程SSH
  • ¥15 怎么让点阵屏显示静态爱心,用keiluVision5写出让点阵屏显示静态爱心的代码,越快越好
  • ¥15 PSPICE制作一个加法器
  • ¥15 javaweb项目无法正常跳转
  • ¥15 VMBox虚拟机无法访问
  • ¥15 skd显示找不到头文件
  • ¥15 机器视觉中图片中长度与真实长度的关系
  • ¥15 fastreport table 怎么只让每页的最下面和最顶部有横线
  • ¥15 R语言卸载之后无法重装,显示电脑存在下载某些较大二进制文件行为,怎么办
  • ¥15 java 的protected权限 ,问题在注释里