I which to extract all the link include on page with anchor or alt attribute on image include in the links if this one come first.
$html = '<a href="lien.fr">Anchor</a>';
Must return "lien.fr;Anchor"
$html = '<a href="lien.fr"><img alt="Alt Anchor">Anchor</a>';
Must return "lien.fr;Alt Anchor"
$html = '<a href="lien.fr">Anchor<img alt="Alt Anchor"></a>';
Must return "lien.fr;Anchor"
I did:
$doc = new DOMDocument();
$doc->loadHTML($html);
$out = "";
$n = 0;
$links = $doc->getElementsByTagName('a');
foreach ($links as $element) {
$href = $img_alt = $anchor = "";
$href = $element->getAttribute('href');
$n++;
if (!strrpos($href, "panier?")) {
if ($element->firstChild->nodeName == "img") {
$imgs = $element->getElementsByTagName('img');
foreach ($imgs as $img) {
if ($anchor = $img->getAttribute('alt')) {
break;
}
}
}
if (($anchor == "") && ($element->nodeValue)) {
$anchor = $element->nodeValue;
}
$out[$n]['link'] = $href;
$out[$n]['anchor'] = $anchor;
}
}
This seems to work but if there some space or indentation it doesn't as
$html = '<a href="link.fr">
<img src="ceinture-gris" alt="alt anchor"/>
</a>';
the $element->firstChild->nodeName will be text