I have trying to pull titles from a page. Everything seems to work so far but I've got doubled results. For example I'm getting h3
titles. On the page is one time but in the source is 2 times.
Here is the example
<span data-img-type='cvr' data-img-att-alt='Cover of Greek Mythology' data-img-size-xs='image.jpg'></span>
<h3> Cover of Greek Mythology </h3>
This will return
Cover of Greek Mythology
Cover of Greek Mythology
I'm targeting only h3 elements but they still appear doubled. How can I remove repeated elements?
Here is what I have so far
$html = file_get_contents('https://example.com/');
$scriptDocument = new DOMDocument();
libxml_use_internal_errors(TRUE);
if(!empty($html)){
$scriptDocument->loadHTML($html);
libxml_clear_errors();
$scriptDOMXPath = new DOMXPath($scriptDocument);
//get all the h3's with an class
$scriptRow = $scriptDOMXPath->query('//h3[@class]');
//check
if($scriptRow->length > 0){
foreach($scriptRow as $row){
echo $row->nodeValue . "<br/>";
}
}
}