Here is a function DOMDocument::saveHTML()
. In the current php versions, this can take a node you want to save as html. To save the inner html of a node, you have to save each child node.
function getHtml($nodes) {
$result = '';
foreach ($nodes as $node) {
$result .= $node->ownerDocument->saveHtml($node);
}
return $result;
}
To fetch the nodes, you can use Xpath. The id is easy.
Fetch all element nodes:
//*
that have the id attribute "content"
//*[@id="content"]
Use only the first found node, in case somebody added the same id multiple times.
//*[@id="content"][1]
Get the child nodes - node() includes element, text and several other nodes
//*[@id="content"][1]/node()
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
echo getHtml($xpath->evaluate('//*[@id="content"][1]/node()'));
The class attribute is a little more complex. Class attributes are token lists, they can contain several class names. Here is a trick to matching them. The Xpath function normalize-space() converts all groups of whitespaces into single space separators. Add a space in front and to the end and you get a string like " one two three "
. Now you can check if " one "
is a part of that string. In Xpath:
Normalize the class attribute:
normalize-space(@class)
Add spaces to start and end:
concat(" ", normalize-space(@class), " ")
Check if it contains the substring
contains(concat(" ", normalize-space(@class), " "), " title ")
Use it to limit the nodes
//*[contains(concat(" ", normalize-space(@class), " "), " title ")][1]/node()
Put together:
$html = <<<'HTML'
<html>
<title></title>
<body>
<h1 class="title">I am title</h1>
<div id="content">
i am the <b>content</b>.
</div>
</body>
HTML;
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
function getHtml($nodes) {
$result = '';
foreach ($nodes as $node) {
$result .= $node->ownerDocument->saveHtml($node);
}
return $result;
}
// first node with the id
var_dump(
getHtml(
$xpath->evaluate('//*[@id="content"][1]/node()')
)
);
// first node with the class
var_dump(
getHtml(
$xpath->evaluate(
'//*[contains(concat(" ", normalize-space(@class), " "), " title ")][1]/node()'
)
)
);
// alternative - handling multiple nodes with the same class in a loop
$nodes = $xpath->evaluate(
'//*[contains(concat(" ", normalize-space(@class), " "), " title ")]'
);
foreach ($nodes as $node) {
var_dump(getHtml($xpath->evaluate('node()', $node)));
}
Output: https://eval.in/118248
string(40) "
i am the <b>content</b>.
"
string(10) "I am title"
string(10) "I am title"