douwei2825
2015-07-19 15:07
浏览 230
已采纳

如何在PHP中使用DomDocument或XPath获取HTML文档的确切结构?

I have an HTML document for example:

<!DOCTYPE html>
<html>
<head>
    <title>Webpage</title>
</head>
<body>
<div class="content">
    <div>
        <p>Paragraph</p>
    </div>
    <div>
        <a href="someurl">This is an anchor</a>
    </div>
    <p>This is a paragraph inside a div</p>
</div>
</body>
</html>

I want to grab exact structure of the div having class of content.

Using DomDocument in PHP if I fetch the div using the getElementsByTagName() method, I am getting this:

    DOMElement Object
  (
    [tagName] => div
    [schemaTypeInfo] => 
    [nodeName] => div
    [nodeValue] => 

        Paragraph


        This is an anchor

    This is a paragraph inside a div

    [nodeType] => 1
    [parentNode] => (object value omitted)
    [childNodes] => (object value omitted)
    [firstChild] => (object value omitted)
    [lastChild] => (object value omitted)
    [previousSibling] => (object value omitted)
    [nextSibling] => (object value omitted)
    [attributes] => (object value omitted)
    [ownerDocument] => (object value omitted)
    [namespaceURI] => 
    [prefix] => 
    [localName] => div
    [baseURI] => 
    [textContent] => 

        Paragraph


        This is an anchor

    This is a paragraph inside a div

)

How can I get this instead:

<div class="content">
    <div>
        <p>Paragraph</p>
    </div>
    <div>
        <a href="someurl">This is an anchor</a>
    </div>
    <p>This is a paragraph inside a div</p>
</div>

Is there any way of doing this?

图片转代码服务由CSDN问答提供 功能建议

我有一个HTML文档,例如:</ p>

 &lt;  !DOCTYPE html&gt; 
&lt; html&gt; 
&lt; head&gt; 
&lt; title&gt;网页&lt; / title&gt; 
&lt; / head&gt; 
&lt; body&gt; 
&lt; div class =“content”&gt; 
&lt;  ; div&gt; 
&lt; p&gt;段落&lt; / p&gt; 
&lt; / div&gt; 
&lt; div&gt; 
&lt; a href =“someurl”&gt;这是一个锚点&lt; / a&gt; 
&lt;  ; / div&gt; 
&lt; p&gt;这是div中的一个段落&lt; / p&gt; 
&lt; / div&gt; 
&lt; / body&gt; 
&lt; / html&gt; 
 </ code> </ pre> \  n 
 

我想获取具有 content </ code>类的div的确切结构。</ p>

如果我使用<获取div,则在PHP中使用DomDocument code> getElementsByTagName()</ code>方法,我得到这个:</ p>

  DOMElement Object 
(
 [tagName] =&gt; div 
 [schemaTypeInfo] =  &gt; 
 [nodeName] =&gt; div 
 [nodeValue] =&gt; 
 
段落
 
 
这是一个锚点
 
这是一个div内的段落
 
 [  nodeType] =&gt;  1 
 [parentNode] =&gt;  (省略对象值)
 [childNodes] =&gt;  (省略对象值)
 [firstChild] =&gt;  (省略对象值)
 [lastChild] =&gt;  (省略对象值)
 [previousSibling] =&gt;  (省略对象值)
 [nextSibling] =&gt;  (省略对象值)
 [attributes] =&gt;  (省略对象值)
 [ownerDocument] =&gt;  (省略对象值)
 [namespaceURI] =&gt;  
 [前缀] =&gt;  
 [localName] =&gt;  div 
 [baseURI] =&gt;  
 [textContent] =&gt;  
 
段落
 
 
这是一个锚点
 
这是一个div内的段落
 
)
 </ code> </ pre> 
 
 

怎么能 我得到了这个:</ p>

 &lt; div class =“content”&gt; 
&lt; div&gt; 
&lt; p&gt; Paragraph&lt; / p&gt; 
&lt;  / div&gt; 
&lt; div&gt; 
&lt; a href =“someurl”&gt;这是一个锚&lt; / a&gt; 
&lt; / div&gt; 
&lt; p&gt;这是一个div内的段落&lt;  / p&gt; 
&lt; / div&gt; 
 </ code> </ pre> 
 
 

有没有办法做到这一点?</ p> </ div>

1条回答 默认 最新

相关推荐 更多相似问题