dongtan5811
dongtan5811
2014-07-22 21:40

从页面获取特定元素

已采纳

I'm trying to pull some data from my website. It is pretty simple, but I can't find any good examples/docs, so I am having a tough time. I'm trying to make an API for my friends to use my blog, but it's a bit difficult. Let's assume I have a website at http://www.sample.com, and the html source for that website is:

  <div class="container">
   <a href="/mywebsiteblogpost/">
      <h2 class="title">im the best</h2>
   </a>
   <span class="author">Josue Espinosa</span> 
   <div class="thumb"> <img src="http://www.sample.com/imgsrc" alt="">
   <span class="category">sports</span> 
   </div>
   <p>preview text</p>
   <a class="more" href="/mywebsiteblogpost/">full text...</a> 
</div>

I want to get all of .container's children, the first a child's href value, the text value of the class title, author, the img src for the child inside .thumb, and the text value for category.

I started with the a href src, but I didn't even get that far. I thought $title would be echoing the href value of the first anchor tag inside of container, but it doesn't work.

$text = file_get_contents('http://www.sample.com');
$doc = new DOMDocument('1.0');
$doc->loadHTML($text);
foreach($doc->getElementsByTagName('div') AS $div) {
    $class = $div->getAttribute('class');
    if(strpos($class, 'container') !== FALSE) {
        // title doesnt retrieve the href value of title :(
        $title = 'TITLE'.$div->getElementsByTagName('a')->getAttribute('href').'<br>';
        //this echos all the text in all of the children of $div
        echo $div->textContent.'<br>';
    }
}

Can anyone explain why please?

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

3条回答

  • doukangbin9698 doukangbin9698 7年前

    The culprit is $div->getElementsByTagName('a')->getAttribute('href'). The first part, $div->getElementsByTagName('a') retrieves a list of elements, not a single element. So the following ->getAttribute('href') will not do the right thing.

    To fix this, iterate just as you do with the div-tags:

    foreach($div->getElementsByTagName('a') as $a) {
      $href = $a->getAttribute('href');
      if ($href) echo "TITLE$href<br>";
    }
    
    点赞 评论 复制链接分享
  • dongzhiyan5693 dongzhiyan5693 7年前

    I made some corrections on the php code you posted that doesn't work, may be it can help you keep going

    $text = file_get_contents('http://www.sample.com');
    $doc = new DOMDocument('1.0');
    $doc->loadHTML($text);
    foreach($doc->getElementsByTagName('div') AS $div) 
    {
        $class = $div->getAttribute('class');
        // _($class);
        if(strpos($class, 'container') !== FALSE) 
        {
            // title doesnt retrieve the href value of title :(
            $a = $div->getElementsByTagName('a');
            foreach ($a as $key => $value) 
            {
                $A = $value;
                break;
            }
            $title = 'TITLE'. $A->getAttribute('href').'<br>';
            //this echos all the text in all of the children of $div
            echo $div->textContent.'<br>';
        }
    }
    
    点赞 评论 复制链接分享
  • duanou3868 duanou3868 7年前

    ok so first

    $div->getElementsByTagName('a')
    

    returns a domnodelist (http://php.net/manual/en/class.domnodelist.php) object, You need to get the first item there to get the attribute.

    Second

    $div->textContent
    

    Does as intended ? show all text content in the $div ?

    You may be better off looking at xpath queries( http://php.net/manual/en/class.domxpath.php) for this type of DOM searching

    点赞 评论 复制链接分享

相关推荐