I am using php DOMDocument to replace a node and then rewrite the page. The HTML that is written back is plain text (not HTML) so I had to convert it like so:
$content = files::readFile($data['page_path']);
$content = str_replace('<', '<', $content);
$content = str_replace('>', '>', $content);
if (!@fwrite($handle, $content))
{
print 'Failed to replace entities';
return FALSE;
}
This makes the HTML proper however, for some odd reason, it adds an extra < / html > tag to the bottom of the document with some additional data after the offending < / html > tag. I am at a total loss as to why.
Anyway, I thought about using:
$content = preg_replace('#\<\/head\>*(:alphanum:)#', '</html>', $content);
to remove it but this doesn't match the way I thought it would.
Help please!
Testing example:
$html = '
<div id="footer">
<div class="wrap">
<strong class="logo"><a href="#">College</a></strong>
<ul><li><a href="#">Emergencies</a></li>
<li><a href="#">Contact</a></li>
<li><a href="#">Copyright</a></li>
<li><a href="#">Terms of Use</a></li>
<li><a href="#">Member of The Colleges</a></li>
</ul><p>© 2010 College</p>
</div>
</div>
</body></html>
li>
<li><a href="#">Contact</a></li>
<li><a href="#">Copyright</a></li>
<li><a href="#">Terms of Use</a></li>
<li><a href="#">Member of The Colleges</a></li>
</ul><p>© 2010 College</p>
</div>
</div>
</body></html>';
preg_match("#</head>.*#si", $html, $matches);
var_dump($matches);