Using preg_match
would indeed be too complicated. As stated on this site many times before: regex + HTML don't mix well. Regex is not suitable to process markup. A DOM parser, however is:
$dom = new DOMDocument;//create parser
$dom->loadHTML($htmlString);
$xpath = new DOMXPath($dom);//create XPath instance for dom, so we can query using xpath
$elemsWithHref = $xpath->query('//*[@href]');//get any node that has an href attribtue
$hrefs = array();//all href values
foreach ($elemsWithHref as $node)
{
$hrefs[] = $node->getAttributeNode('href')->value;//assign values
}
After this, it's a simple matter of processing the values in $hrefs
, which will be an array of strings, each of which are the value of a href
attribute.
Another example of using DOM parsers and XPath (to show you what it can do): can be found here
To replace the nodes with the href
values, it's a simple matter of:
- Getting the parent node
- constructing a text-node
- calling
DOMDocument::replaceChild
- Finnishing up by calling
save
to write to a file, or saveHTML
or saveXML
to get the DOM as a string
An example:
$dom = new DOMDocument;//create parser
$dom->loadHTML($htmlString);
$xpath = new DOMXPath($dom);//create XPath instance for dom, so we can query using xpath
$elemsWithHref = $xpath->query('//*[@href]');//get any node that has an href attribtue
foreach ($elemsWithHref as $node)
{
$parent = $node->parentNode;
$replace = new DOMText($node->getAttributeNode('href')->value);//create text node
$parent->replaceChild($replace, $node);//replaces $node with $replace textNode
}
$newString = $dom->saveHTML();