dongyan3616
dongyan3616
2013-07-09 12:48
浏览 51
已采纳

php domdocument loadHTML和getElementsByTagName什么都不返回

$urlToScrap = "https://play.google.com/store/apps/details?id=flipboard.app#?t=W251bGwsMSwxLDIxMiwiZmxpcGJvYXJkLmFwcCJd";
$pageContentData = file_get_contents($urlToScrap);
$doc = new DOMDocument();
$doc->loadHTML($pageContentData);
$listOfDivs = $doc->getElementsByTagName("div");
foreach ($listOfDivs as $div) {
    if($div->getAttribute("class") == "doc-banner-icon"){
        $img = $div->getElementsByTagName("img");
        var_dump($img->getAttribute("src"));
    }
}

returns empty.

I have the following elements in the dom:

<div class="doc-banner-icon"><img src="somesrc"></div>

I'm trying to get the img src and since in the page there are many images, I would like to first get the parent div and then extract the image inside it.

The solution is here:

$urlToScrap = "https://play.google.com/store/apps/details?id=flipboard.app#?t=W251bGwsMSwxLDIxMiwiZmxpcGJvYXJkLmFwcCJd";
$pageContentData = file_get_contents($urlToScrap);
$doc = new DOMDocument();
$doc->loadHTML($pageContentData);
$listOfDivs = $doc->getElementsByTagName("div");
foreach ($listOfDivs as $div) {
    if($div->getAttribute("class") == "doc-banner-icon"){
        $listOfImages = $div->getElementsByTagName("img");
        foreach($listOfImages as $img){
            var_dump($img->getAttribute("src"));
        }
    }
}
  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 邀请回答

1条回答 默认 最新

  • donglue8180
    donglue8180 2013-07-09 12:57
    已采纳

    You aren't missing anything, var_dump doesn't work as you expect on a DOMNodeList. Try this instead:

    $listOfImages = $doc->getElementsByTagName("img");
    
    foreach ($listOfImages as $img) {
        $imgClass = $img->getAttribute('class');
    
        echo $imgClass;
    }
    

    In your updated question, just change:

    $img->getAttribute("src")
    

    to:

    $img->item(0)->getAttribute("src")
    

    Given that your selection criteria is fairly complex, you might consider using XPath instead of navigating manually:

    $doc = new DOMDocument();
    $doc->loadHTML($pageContentData);
    
    $xpath = new DOMXPath($doc);
    $img = $xpath->query("//div[@class = 'doc-banner-icon']/img");
    
    var_dump($img->item(0)->getAttribute('src'));
    
    点赞 评论

相关推荐