如何在找到所有信息之前创建一个重复的函数？

I want to create a PHP function that goes through a website's homepage, finds all the links in the homepage, goes through the links that it finds and keeps going until all the links on said website are final. I really need to build something like this so I can spider my network of sites and supply a "one stop" for searching.

Here's what I got so far -

function spider($urltospider, $current_array = array(), $ignore_array = array('')) {
    if(empty($current_array)) {
        // Make the request to the original URL
        $session = curl_init($urltospider);
        curl_setopt($session, CURLOPT_RETURNTRANSFER, true);
        $html = curl_exec($session);
        curl_close($session);
        if($html != '') {
            $dom = new DOMDocument();
            @$dom->loadHTML($html);
            $xpath = new DOMXPath($dom);
            $hrefs = $xpath->evaluate("/html/body//a");
            for($i = 0; $i < $hrefs->length; $i++) {
                $href = $hrefs->item($i);
                $url = $href->getAttribute('href');
                if(!in_array($url, $ignore_array) && !in_array($url, $current_array)) {
                    // Add this URL to the current spider array
                    $current_array[] = $url;
                }
            }               
        } else {
            die('Failed connection to the URL');
        }
    } else {
        // There are already URLs in the current array
        foreach($current_array as $url) {
            // Connect to this URL

            // Find all the links in this URL

            // Go through each URL and get more links
        }
    }
}

The only problem is, I can't seem to get my head around how to proceed. Can anyone help me out? Basically, this function will repeat itself until everything has been found.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

4条回答默认最新

dpi10335 2010-07-18 06:11

关注

I'm not PHP expert, but you seem to be over-complicating it.

function spider($urltospider, $current_array = array(), $ignore_array = array('')) {
    if(empty($current_array)) {
        $current_array[] =  $urltospider;
    $cur_crawl = 0;
    while ($cur_crawl < len($current_array)) { //don't use foreach because that can get messed up if you change the array while inside the loop.
        $links_found = crawl($current_array($cur_crawl)); //crawl should return all links found in the given page
        //Now keep adding $links_found to $current_array. Maybe you can check if any of the links found are already in $current_array so you don't crawl them multiple times
        $current_array = array_merge($current_array, $links_found);
        $cur_crawl += 1;
    }
return $current_array;
}

本回答被题主选为最佳回答 , 对您是否有帮助呢?

查看更多回答(3条)

报告相同问题？

关注问题

python中定义了两个函数，但报错其中一个未定义？ python
2019-07-23 10:30

回答 3 已采纳 ``` #!/usr/bin/python3 # -*- coding: utf-8 -*- class ClassTest(object): """docstring for
c++怎么在view里的一个函数引用的函数里再引用一个函数？ c++
2016-01-09 07:34

回答 4 已采纳 Mfa不是 CZHANGMIN1View类的成员函数，不能直接使用InsertSort 可以把Mfa函数改为CZHANGMIN1View类的成员函数 ``` void CZHANGMIN1
c++问题：如何在不写重载函数的情况下，将不同的函数指针作为参数传入同一个函数？ c++
2016-12-20 05:25

回答 2 已采纳函数参数用void*，再加一个type的参数，函数内部根据type把参数转换回对应的函数指针类型
python编写一个判断完数的函数过程_python算法题
2020-12-03 05:06

weixin_39649736的博客 1.题目：有1、2、3、4个数字，能组成多少个互不相同且无重复数字的三位数？都是多少？程序分析：可填在百位、十位、个位的数字都是1、2、3、4。组成所有的排列后再去掉不满足条件的排列。if __name__ == "__main__":...
如何在点击一个<div>时去执行一个javascript里的函数？ javascript
2018-05-18 10:43

回答 9 已采纳被action给干扰了
自定义一个函数，能不能在函数执行到一半的时候，发送信息给调用者，然后函数继续执行？？？ python
2018-08-18 08:14

回答 10 已采纳 python语言的话，可以用生成器函数 https://www.liaoxuefeng.com/wiki/001374738125095c955c1e6d8bb493182103fac9270762
定义一个函数prime判断某个整数是否为素数； python 有问必答
2021-05-24 09:11

回答 4 已采纳 import math,re # 判断素数函数 def prime(num): flag = False if num > 1: for i in range(2, math.fl
PHP基础知识 - PHP函数大全
2022-07-20 13:51

后端木木的博客 PHP函数
怎么在一个子函数里使用全局变量作为参数？？？ stm32 单片机
2017-10-31 15:35

回答 10 已采纳在调用的地方直接 SUM（X，...， ...）; 就可以了
写一个函数将3*3的矩阵转置 c语言
2022-04-26 13:06

回答 3 已采纳我试了一下，这没啥问题，能正常运行：代码： #include <stdio.h> void trans(int (*s)[3], int x, int y); //定义转置函
C++结构体如何定义构造函数？？ c++ c语言
2019-09-09 22:34

回答 2 已采纳书籍:`C++ primer` 这种形式的构造函数是创建的时候初始化，然后再调用构造函数。也就是说变量`label`的赋值完成后，才会执行构造函数里面的语句。 C++结构体和类相似，有部分不同
PHP常用函数总结
2021-10-19 22:51

muwenbo666的博客 1.abs() 函数返回一个数的绝对值。示例：echo abs(-4.2); 输出 4.2 2.ceil()函数向上舍入为最接近的整数。示例：echo ceil(5.1); echo ceil(-5.1); 输出：6；-5 3.floor()函数向下舍入为最接近的整数。示例：echo ...
为什么一个c程序只能有一个main函数
2015-11-29 10:00

回答 11 已采纳不是说一个程序只能有一个main函数，而是说一个程序中，每个函数必须有自己的名字（C++有函数重载，同名，同参数的函数也只能有一个）。否则你调用函数的时候，编译器怎么知道你调用的是哪一个。至于
PHP常用函数大全
2018-06-11 14:28

行善积德韩老魔的博客字符串函数strlen：获取字符串长度，字节长度substr_count 某字符串出现的次数substr：字符串截取，获取字符串（按照字节进行截取）mb_strlenmb_substrstrchr：与substr相似，从指定位置截取一直到最后strrchr（获取...
详解JavaScript创建对象——构造函数模式
2022-07-26 18:10

JV_32的博客 JavaScript面向对象篇——构造函数
没有解决我的问题, 去提问

悬赏问题

¥20 机器学习能否像多层线性模型一样处理嵌套数据
¥20 西门子S7-Graph,S7-300，梯形图
¥50 用易语言http 访问不了网页
¥50 safari浏览器fetch提交数据后数据丢失问题
¥15 matlab不知道怎么改，求解答！！
¥15 永磁直线电机的电流环pi调不出来
¥15 用stata实现聚类的代码
¥15 请问paddlehub能支持移动端开发吗？在Android studio上该如何部署？
¥20 docker里部署springboot项目，访问不到扬声器
¥15 netty整合springboot之后自动重连失效

码龄粉丝数原力等级 --

如何在找到所有信息之前创建一个重复的函数？

4条回答默认最新

码龄粉丝数原力等级 --

悬赏问题

如何在找到所有信息之前创建一个重复的函数？

4条回答 默认 最新

悬赏问题

4条回答默认最新