doskmc7870
2012-04-23 20:53
浏览 56
已采纳

PHP Curl重定向后

I'm trying to be a bit sneeky and as part of a learning process try and improve my page scraping skills.

One thing i've come across that I have yet to be able to solve is that certain sites will use an internal link which then redirects to an external link.

What I want to do is modify some curl code to follow the redirects until they stop and then obtain the final resting place URL.

Anyone recommend some code for me?

I have this at the moment, but it's not following the redirects properly at the moment.

        $opts = array(CURLOPT_URL => $url,
                      CURLOPT_RETURNTRANSFER => true,
                      CURLOPT_HEADER => true,
                      CURLOPT_FOLLOWLOCATION => true);      

        $curl = curl_init(); 
        curl_setopt_array($curl, $opts);  
        $str = curl_exec($curl);  
        curl_close($curl);  
  • 写回答
  • 好问题 提建议
  • 追加酬金
  • 关注问题
  • 邀请回答

2条回答 默认 最新

  • dongshengheng1013 2012-04-23 20:59
    最佳回答

    http.//php.net/manual/en/ref.curl.php

       function get_final_url( $url, $timeout = 5 )
     {
        $url = str_replace( "&", "&", urldecode(trim($url)) );
    
       $cookie = tempnam ("/tmp", "CURLCOOKIE");
    $ch = curl_init();
    curl_setopt( $ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1" );
    curl_setopt( $ch, CURLOPT_URL, $url );
    curl_setopt( $ch, CURLOPT_COOKIEJAR, $cookie );
    curl_setopt( $ch, CURLOPT_FOLLOWLOCATION, true );
    curl_setopt( $ch, CURLOPT_ENCODING, "" );
    curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true );
    curl_setopt( $ch, CURLOPT_AUTOREFERER, true );
    curl_setopt( $ch, CURLOPT_CONNECTTIMEOUT, $timeout );
    curl_setopt( $ch, CURLOPT_TIMEOUT, $timeout );
    curl_setopt( $ch, CURLOPT_MAXREDIRS, 10 );
    $content = curl_exec( $ch );
    $response = curl_getinfo( $ch );
    curl_close ( $ch );
    
    if ($response['http_code'] == 301 || $response['http_code'] == 302)
    {
        ini_set("user_agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1");
        $headers = get_headers($response['url']);
    
        $location = "";
        foreach( $headers as $value )
        {
            if ( substr( strtolower($value), 0, 9 ) == "location:" )
                return get_final_url( trim( substr( $value, 9, strlen($value) ) ) );
        }
    }
    
    if (    preg_match("/window\.location\.replace\('(.*)'\)/i", $content, $value) ||
            preg_match("/window\.location\=\"(.*)\"/i", $content, $value)
    )
    {
        return get_final_url ( $value[1] );
    }
    else
    {
        return $response['url'];
       }
    }
    
    评论
    解决 无用
    打赏 举报
查看更多回答(1条)

相关推荐 更多相似问题