This question already has an answer here:
- How to return outer html of DOMDocument? 3 answers
I have:
<html>
<head>
<title>My Page</title>
</head>
<body>
<p>paragraph 1</p>
<p>paragraph 2</p>
<p>paragraph 3</p>
<p>paragraph 4</p>
<ul>
<li>item # 1</li>
<li>item # 2</li>
<li>item # 3</li>
<li>item # 4</li>
</ul>
<a href="#">anchor 1</a>
<a href="#">anchor 2</a>
<a href="#">anchor 3</a>
<a href="#">anchor 4</a>
<div>div # 1</div>
<div>div # 2</div>
<div>div # 3</div>
<div>div # 4</div>
</body>
</html>
I want to be able to extract a specified tag, lets say a div tag, and it's contents.
So far I have
$file = file_get_contents('file.html');
$dom = new DOMDocument();
$dom->loadHTML( $file );
$xpath = new DOMXpath( $dom );
$paragraphs = $xpath->query("/html/body//p");
for( $i = 0; $i < $paragraphs->length; $i++ )
{
# echo the tag and it's contents
}
I tried using nodeValue
or textContent
but they just print the content of the tag and not the tags plus their content.
This is my first time using the DOM parser in PHP. I know that the use of regexes to parse HTML/XML is protested against so I am using the DOM parser. Any suggestions would help.
</div>