douwei2825 2015-07-19 15:07
浏览 232
已采纳

如何在PHP中使用DomDocument或XPath获取HTML文档的确切结构?

I have an HTML document for example:

<!DOCTYPE html>
<html>
<head>
    <title>Webpage</title>
</head>
<body>
<div class="content">
    <div>
        <p>Paragraph</p>
    </div>
    <div>
        <a href="someurl">This is an anchor</a>
    </div>
    <p>This is a paragraph inside a div</p>
</div>
</body>
</html>

I want to grab exact structure of the div having class of content.

Using DomDocument in PHP if I fetch the div using the getElementsByTagName() method, I am getting this:

    DOMElement Object
  (
    [tagName] => div
    [schemaTypeInfo] => 
    [nodeName] => div
    [nodeValue] => 

        Paragraph


        This is an anchor

    This is a paragraph inside a div

    [nodeType] => 1
    [parentNode] => (object value omitted)
    [childNodes] => (object value omitted)
    [firstChild] => (object value omitted)
    [lastChild] => (object value omitted)
    [previousSibling] => (object value omitted)
    [nextSibling] => (object value omitted)
    [attributes] => (object value omitted)
    [ownerDocument] => (object value omitted)
    [namespaceURI] => 
    [prefix] => 
    [localName] => div
    [baseURI] => 
    [textContent] => 

        Paragraph


        This is an anchor

    This is a paragraph inside a div

)

How can I get this instead:

<div class="content">
    <div>
        <p>Paragraph</p>
    </div>
    <div>
        <a href="someurl">This is an anchor</a>
    </div>
    <p>This is a paragraph inside a div</p>
</div>

Is there any way of doing this?

  • 写回答

1条回答 默认 最新

  • doujue6196 2015-07-19 15:31
    关注

    Suppose, $str contains the HTML

    // Create DomDocument
    $doc = new DomDocument();
    $doc->loadHTML($str);
    // Find needed div
    $xpath = new DOMXpath($doc);
    $elements = $xpath->query('//div[@class = "content"]');
    // What to do if divs more that one?
    if ($elements->length != 1) die("some divs in the document have class 'content'");
    // Take first
    $div = $elements->item(0);
    // Echo content of node $div
    echo $doc->saveHTML($div);
    

    result

    <div class="content">
        <div>
            <p>Paragraph</p>
        </div>
        <div>
            <a href="someurl">This is an anchor</a>
        </div>
        <p>This is a paragraph inside a div</p>
    </div>
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥50 易语言把MYSQL数据库中的数据添加至组合框
  • ¥20 求数据集和代码#有偿答复
  • ¥15 关于下拉菜单选项关联的问题
  • ¥20 java-OJ-健康体检
  • ¥15 rs485的上拉下拉,不会对a-b<-200mv有影响吗,就是接受时,对判断逻辑0有影响吗
  • ¥15 使用phpstudy在云服务器上搭建个人网站
  • ¥15 应该如何判断含间隙的曲柄摇杆机构,轴与轴承是否发生了碰撞?
  • ¥15 vue3+express部署到nginx
  • ¥20 搭建pt1000三线制高精度测温电路
  • ¥15 使用Jdk8自带的算法,和Jdk11自带的加密结果会一样吗,不一样的话有什么解决方案,Jdk不能升级的情况