qq_41543706
匆匆走开
采纳率100%
2019-04-02 19:59 阅读 597
已采纳

PHP循环抓取多个URL页面时,抓着抓着就停止了,该怎么办?

PHP抓取页面中途停止怎么办如果我下次想从停止的地方抓取的话该怎么做?

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享

3条回答 默认 最新

  • 已采纳
    bairuijin 请稍后再查看 2019-04-03 10:32

    set_time_limit(0); 加上这个可以循环完

    点赞 评论 复制链接分享
  • caozhy 从今以后生命中的每一秒都属于我爱的人 2019-04-03 00:36

    将遍历的页面的地址保存到数据库或者文件里。下次运行的时候,据此设置为循环开始的值。

    点赞 评论 复制链接分享
  • qq_41543706 匆匆走开 2019-04-03 11:02
    #!/usr/bin/php
    #--*-- coding: utf8 --*--
    <?php
    set_time_limit(0); 
    error_reporting(E_ALL^E_NOTICE);
    $nextUrl = "GEN.1";
    while(!empty($nextUrl)){
        $userAgent = 'Mozilla/5.0 (Windows; U; Windows NT 5.2) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.27 Safari/525.13';
        $ch= curl_init();
        curl_setopt($ch, CURLOPT_URL,"https://wdbible.com/api/bible/chapterhtml/cunps/{$nextUrl}");
        curl_setopt($ch, CURLOPT_HEADER,0);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
        curl_setopt($ch, CURLOPT_USERAGENT,$userAgent);
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
        curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);
        $data = curl_exec($ch);
    //  curl_close($ch);
        if(!empty($data)){
            $data = json_decode($data,true);
            $content = $data['data']['content'];
        }else{
            echo "{$nextUrl}章节访问失败,重新访问。。。\r\n";
            $data = curl_exec($ch);
        }
        $file1 = "D:/cmd/xml/{$nextUrl}.xml";
        file_put_contents($file1,$content);
        echo "{$nextUrl}.xml生成成功。\r\n";
        $file = "D:/cmd/txt/{$nextUrl}.txt";
        $stack = array();
        $top = -1;
        $xmlParser = xml_parser_create();
        xml_set_element_handler($xmlParser,"Start","Stop");     
        xml_set_character_data_handler($xmlParser,"char");  
        $fp = fopen("$file1","r");  
        while($row = fread($fp,10000)){     
                xml_parse($xmlParser,$row) or 
                    die(xml_error_string(xml_get_error_code($xmlParser), 
                    xml_get_current_line_number($xmlParser)));
        }   
        xml_parser_free($xmlParser);
        echo "{$nextUrl}章节抓取成功。。。\r\n";
        $nextUrl = $data['data']['nextChapterUsfm'];
        if(!empty($nextUrl)){
            echo "读取下一章节。。。\r\n";
        }else{
            echo "下一章节路径获取不到,重新获取。。。\r\n";
            $nextUrl = $data['data']['nextChapterUsfm'];
        }
    }
    echo "抓取结束。。。。。。\r\n";  
    function Start($parser, $element_name, $element_attr){
        global $top,$stack;
        if($element_name == "DIV" && count($element_attr) == 1){
            $top++;
            array_push($stack,$element_name);
            $top++;
            array_push($stack,$element_attr);
        }else{
            $top++;
            array_push($stack,$element_name);
        }
    }
    function Stop($parser, $element_name){  
        global $top,$stack,$file;
        switch($element_name){
            case "H6" :
                file_put_contents($file,"\r\n",FILE_APPEND);
                array_pop($stack);
                $top--;
                array_pop($stack);
                $top--;
                break;
            case "H5" :
                file_put_contents($file,"\r\n",FILE_APPEND);
                array_pop($stack);
                $top--;
                array_pop($stack);
                $top--;
                break;
            case "MARK" :
                array_pop($stack);
                $top--;
                break;
            case "SPAN" :
                array_pop($stack);
                $top--;
                break; 
            case "li" :
                file_put_contents($file,"\r\n",File_APPEND);
                array_pop($stack);
                $top--;
                break;
            case "DIV" :
                if($stack[$top] == "DIV"){
                    array_pop($stack);
                    $top--;
                }else{
                    file_put_contents($file,"\r\n",FILE_APPEND);
                    array_pop($stack);
                    $top--;
                    array_pop($stack);
                    $top--;
                }
                break;
            case "P" :
                array_pop($stack);
                $top--;
        }
    }
    function char($parser, $data1){
        global $top,$stack,$file;
        if (strlen(trim($data1)) > 0){                  
            file_put_contents($file,$data1,FILE_APPEND);
        }   
    }
    ?>
    
    
    
    
    

    总是搞着搞着,下一个URL就访问不到了。。
    图片说明

    有一个这样的警告。。

    点赞 评论 复制链接分享

相关推荐