dongwei8440 2018-09-19 03:46 采纳率: 100%
浏览 7

如何排除加倍的DOMDocument元素

I have trying to pull titles from a page. Everything seems to work so far but I've got doubled results. For example I'm getting h3 titles. On the page is one time but in the source is 2 times.

Here is the example

<span data-img-type='cvr' data-img-att-alt='Cover of Greek Mythology' data-img-size-xs='image.jpg'></span>
<h3> Cover of Greek Mythology </h3>

This will return

Cover of Greek Mythology
Cover of Greek Mythology

I'm targeting only h3 elements but they still appear doubled. How can I remove repeated elements?

Here is what I have so far

$html = file_get_contents('https://example.com/'); 

$scriptDocument = new DOMDocument();

libxml_use_internal_errors(TRUE); 

if(!empty($html)){ 

    $scriptDocument->loadHTML($html);
    libxml_clear_errors(); 
    $scriptDOMXPath = new DOMXPath($scriptDocument);
    //get all the h3's with an class
    $scriptRow = $scriptDOMXPath->query('//h3[@class]');
    //check
    if($scriptRow->length > 0){
        foreach($scriptRow as $row){
            echo $row->nodeValue . "<br/>";
        }
    }
}
  • 写回答

0条回答 默认 最新

    报告相同问题?

    悬赏问题

    • ¥15 Vue3 大型图片数据拖动排序
    • ¥15 划分vlan后不通了
    • ¥15 GDI处理通道视频时总是带有白色锯齿
    • ¥20 用雷电模拟器安装百达屋apk一直闪退
    • ¥15 算能科技20240506咨询(拒绝大模型回答)
    • ¥15 自适应 AR 模型 参数估计Matlab程序
    • ¥100 角动量包络面如何用MATLAB绘制
    • ¥15 merge函数占用内存过大
    • ¥15 使用EMD去噪处理RML2016数据集时候的原理
    • ¥15 神经网络预测均方误差很小 但是图像上看着差别太大