I have a string of HTML and I need check whether the href attributes of any anchors contain a certain link pattern. If they match a certain pattern I need to modify them.
Here's a sample HTML string:
<p>Disculpa, pero esta entrada está disponible sólo en <a href="http://www.example.com/static/?json=get_page&post_type=page&slug=sample-page&lang=ru">Pусский</a> y <a href="http://www.example.com/static/?json=get_page&post_type=page&sample-page&lang=en">English</a>.</p>
So the URLs in question take the following pattern
http://www.example.com/static/?json=get_page&post_type=page&slug=sample-page&lang=ru
Where the lang query attribute is variable in its value.
If a href matching that pattern is found I need to change it to:
http://www.example.com/ru/sample-page
So I need to remove 'static' and replace it with the value of the lang attribute, and I need to append the value of the 'slug' attribute to the end of the URL.
Sadly I'm getting confounded at the first step so I haven't even been able to test out methods of parsing the URLs and replacing them with the new value.
$html = '<p>Disculpa, pero esta entrada está disponible sólo en <a href="http://www.example.com/static/?json=get_page&post_type=page&slug=sample-page&lang=ru">Pусский</a> y <a href="http://www.example.com/static/?json=get_page&post_type=page&sample-page&lang=en">English</a>.</p>';
$dom = new DOMDocument;
// The UTF-8 encoding is necessary
$dom->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'));
$anchors = $dom->getElementsByTagName('a');
In theory from this point on I'd loop through the anchors found and do stuff, but if I var_dump the $anchors variable I just get:
object(DOMNodeList)#66 (0) { }
So I can't even proceed further!
Any idea what's causing the DOM to fail to collect the anchors?
After that any suggestions on how to best identify if the anchor contains the URL pattern, change it and return the new modified HTML?
Update 1
So it turns out that there's a PHP bug pre 5.4.1 which prevents var_dump from displaying the contents of the DOMNodeList. I can find values with
foreach ($anchors as $anchors) {
echo $anchors->nodeValue, PHP_EOL;
}
However I have no idea what the $anchors object really looks like so am running blind. If anyone has any suggestions on how to parse the $anchors and modify them as originally mentioned that would be hugely appreciated (whilst I try to sort out a PHP5.4.1 instance)