duancu4731 2019-02-22 05:48
浏览 143
已采纳

使用cURL抓取重定向的网址

I am trying to find where I'll be redirected at. So I tried to functions for this, but none of those are working properly.

the links is here. when you try to enter, you will be redirected:

https://lions-mansion.jp/MA141070/

so I tried use cURL,

function redirect1($url) { 
        $ch = curl_init($url);
        curl_setopt($ch, CURLOPT_HEADER, false);
        curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
        curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT ,0);
        curl_setopt($ch, CURLOPT_TIMEOUT, 60);
        $data = curl_exec($ch);
        $data = curl_getinfo($ch,CURLINFO_EFFECTIVE_URL );
        curl_close($ch);

        return $data;
    }

and also this:

function redirect($url) {
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_HEADER, true);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        $result = curl_exec($ch);
        if (preg_match('~Location: (.*)~i', $result, $match)) {
           $location = trim($match[1]);
        }

        return $result;
    }

But I couldn't find the redirected url.

  • 写回答

2条回答 默认 最新

  • douliudong8108 2019-03-02 07:45
    关注

    this page does not use a redirect-scheme that libcurl understands (it uses a html <meta http-equiv="REFRESH"-redirect, unsupported by libcurl), so libcurl can neither tell you where it is being redirected, nor can libcurl auto-follow the redirect (because libcurl does not understand it)

    you need to parse out the redirect url yourself from the HTML, eg

    function redirect1($url) { 
            $ch = curl_init($url);
            curl_setopt($ch, CURLOPT_HEADER, false);
            curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
            curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
            curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
            curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
            curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
            curl_setopt($ch, CURLOPT_CONNECTTIMEOUT ,0);
            curl_setopt($ch, CURLOPT_TIMEOUT, 60);
            $data = curl_exec($ch);
            $domd=@DOMDocument::loadHTML($data);
            $xp=new DOMXPath($domd);
            // <META http-equiv="REFRESH" content="0;URL=http://sumai.tokyu-land.co.jp/branz/roppongi4/?iad=daikyo" />
    
            $location=$xp->query("//meta[@http-equiv='REFRESH']")->item(0)->getAttribute("content");
            // 0;URL=http://sumai.tokyu-land.co.jp/branz/roppongi4/?iad=daikyo   
            $location=substr($location,stripos($location,'URL=')+4);
            curl_close($ch);
            return $location;
        }
        var_dump(redirect1('https://lions-mansion.jp/MA141070/'));
    

    output:

    C:\projects\misc>php re.php
    string(57) "http://sumai.tokyu-land.co.jp/branz/roppongi4/?iad=daikyo"
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 宇视监控服务器无法登录
  • ¥15 PADS Logic 原理图
  • ¥15 PADS Logic 图标
  • ¥15 电脑和power bi环境都是英文如何将日期层次结构转换成英文
  • ¥15 DruidDataSource一直closing
  • ¥20 气象站点数据求取中~
  • ¥15 如何获取APP内弹出的网址链接
  • ¥15 wifi 图标不见了 不知道怎么办 上不了网 变成小地球了
  • ¥50 STM32单片机传感器读取错误
  • ¥50 power BI 从Mysql服务器导入数据,但连接进去后显示表无数据