Possible Duplicate:
Cleaning HTML by removing extra/redundant formatting tags
I have been trying to remove redundant tags which are generated from HTML composers. This apparently is not able to remove all the empty ones. I have been looking at it for sometime and I am not able to figure out. There might be something I am missing.
Below is the code. Thanks a lot ppl..
//Check for reduntant tags
function removeRedundantTags($pathname) {
$dom = new DOMDocument();
$dom->loadHTMLFile($pathname);
$allTags = $dom->getElementsByTagName('*');
for($i = 0; $i < $allTags->length; $i++) {
$currentTag = $allTags->item($i);
echo "Accessed Tags: ".$currentTag->nodeName.'<br>';
if($currentTag->hasChildNodes()) continue;
if($currentTag->nodeName == 'br' || $currentTag->nodeName == 'img' || $currentTag->nodeName == 'meta') continue;
if($currentTag->nodeValue == NULL) {
$parentNode = $currentTag->parentNode;
$oldChild = $parentNode->removeChild($currentTag);
echo "Removed Tags----: ".$oldChild->nodeName.'<br>';
}
}
echo "Redandant Removed<br>";
$dom->saveHTMLFile($pathname);
}
Edit (output added) Lets saying I am trying to cleanup span tags (sorry I am not able to post HTML code) It is just removing half of it.. It is like two of the span tags are present it removes only one, and the same applies to all of the empty tags
I am using DOM structure which happens to be very fast as I will be using this piece of code to hundreds of HTML files. So some of the answers use regular expressions which are not helpful.