dongque1958
2013-11-17 18:08
浏览 45
已采纳

用于后代或自我的xpath,同时保持文本的顺序相同

I am trying to extract the text from the below html structure using xpath, The xpath expression i am using is

'//div[@class="descr_id"]/descendant-or-self::*/text()'

But the array I get from above, does change the order of the text, it first gives me all the descendant and then self text while I plan to exactly get all the text in below kind of html structure in the same order like "This text 1 This text 2 This text 3.........".

<div class="descr_id">
         This text 1
         <a href="www.example.com">This text 2</a>
         This text 3 
         <a href="www.example2.com">This text 4</a>
         This text main 5
         <ul>
           <li>
           This text 6</li>
           <li>
           This text 7</li>
        </ul>
    </div>
  • 写回答
  • 好问题 提建议
  • 追加酬金
  • 关注问题
  • 邀请回答

2条回答 默认 最新

  • dtla92562 2013-11-17 18:35
    最佳回答

    Try http://sandbox.onlinephpfunctions.com/code/99f45357f08f3833773ba7ada0f5fbf6a4b7180c which does

    $html = <<<EOD
    <div class="descr_id">
             This text 1
             <a href="www.example.com">This text 2</a>
             This text 3 
             <a href="www.example2.com">This text 4</a>
             This text main 5
             <ul>
               <li>
               This text 6</li>
               <li>
               This text 7</li>
            </ul>
        </div>
    EOD;
    
    $doc = new DOMDocument();
    $doc->loadHTML($html);
    
    $xpath = new DOMXPath($doc);
    
    $textNodes = $xpath->query('//div[@class="descr_id"]//text()[normalize-space()]');
    
    
    foreach ($textNodes as $text)
    {
      echo "$text->nodeValue
    ";
    }
    

    and outputs the text node descendants in document order. You might want to trim the values however if you want e.g. This text 1 without the leading and/or trailing white space.

    评论
    解决 无用
    打赏 举报
查看更多回答(1条)

相关推荐 更多相似问题