doy2255 2016-03-26 10:28
浏览 38
已采纳

PHP DOM从页面获取所有href并删除重复项

I am creating a script that gets all links from an website and checking every link if is broken. My problem is, I need to display all links founded but I need to verify only the unique links not the duplicate. For example if a website has 4 links to google.com then I want to check only one time not four times.

foreach ($dom->getElementsByTagName('a') as $node) {

$info = $node->getAttribute( 'href' );

///The function that checks for broken links working. 
$check_url_status = check_url($info);

if ($check_url_status == '404') {

$badresult = 'Not working';

}else{

$badresult = 'Working';

}

$showlist .= '<li>The '.$info.' is '.$badresult.'</li>';

}


echo '<ul>'.$showlist.'</ul>';

This code is works, but I need to make it check the http status only 1 time for duplicate links.

I do not have idea of how to do this and also if is posible to do something like that.

  • 写回答

2条回答 默认 最新

  • dongtan6206 2016-03-26 10:36
    关注

    You can make an array which you can use to save all already checked links in it. You will always check before checking the status if the link is already in the array or not. If so, skip link. If not, check status and add link to the array. You can skip an element by using the keyword continue.

    $links = array();
    foreach ($dom->getElementsByTagName('a') as $node) {
        $info = $node->getAttribute('href');
    
        if(!isset($links[$info])) {
            ///The function that checks for broken links working.
            $check_url_status = check_url($info);
            $links[$info] = $check_url_status;
        } else {
            $check_url_status = $links[$info];
        }
    
        if ($check_url_status == '404') {
            $badresult = 'Not working';
        } else {
            $badresult = 'Working';
        }
    
        $showlist .= '<li>The '.$info.' is '.$badresult.'</li>';
    }
    
    echo '<ul>'.$showlist.'</ul>';
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
  • ¥100 为什么这个恒流源电路不能恒流?
  • ¥15 有偿求跨组件数据流路径图
  • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值
  • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
  • ¥15 CSAPPattacklab
  • ¥15 一直显示正在等待HID—ISP
  • ¥15 Python turtle 画图
  • ¥15 stm32开发clion时遇到的编译问题
  • ¥15 lna设计 源简并电感型共源放大器