In other words you would like to remove any element node that has no text content, no attribute, no children with text content or attributes and have a parent element node (are not the document element).
Here is an Xpath function normalize-space()
that converts any whitespace sequences to single spaces and strips them from the start/end. Any whitespace only content will result in an empty string.
Xpath
//*
fetches any element node in the document in a list. You just need to add conditions.
- Has no text content
normalize-space(.) = ""
- No attributes
not(@*)
- No descendant node with content (includes comments, ...)
not(.//node()[normalize-space(.) != ""])
- No descendant element nodes with attributes
not(.//*[@*])
- Has a parent element node
parent::*
Put together:
$xml = <<<'XML'
<foo>
<bar></bar>
<bar>123</bar>
<bar foo="123"></bar>
<bar><foo> </foo></bar>
<bar><!-- test --></bar>
</foo>
XML;
$document = new DOMDocument();
$document->preserveWhiteSpace = FALSE;
$document->formatOutput = TRUE;
$document->loadXml($xml);
$xpath = new DOMXpath($document);
$expression =
'//*[
normalize-space(.) = "" and
not(@*) and
not(.//node()[normalize-space(.) != ""]) and
not(.//*[@*]) and
parent::*
]';
$nodes = $xpath->evaluate($expression);
for ($i = $nodes->length - 1; $i >= 0; $i--) {
$nodes[$i]->parentNode->removeChild($nodes[$i]);
}
echo $document->saveXml();
Output:
<?xml version="1.0"?>
<foo>
<bar>123</bar>
<bar foo="123"/>
<bar>
<!-- test -->
</bar>
</foo>