duancu4731 2019-02-22 05:48
浏览 141
已采纳

使用cURL抓取重定向的网址

I am trying to find where I'll be redirected at. So I tried to functions for this, but none of those are working properly.

the links is here. when you try to enter, you will be redirected:

https://lions-mansion.jp/MA141070/

so I tried use cURL,

function redirect1($url) { 
        $ch = curl_init($url);
        curl_setopt($ch, CURLOPT_HEADER, false);
        curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
        curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT ,0);
        curl_setopt($ch, CURLOPT_TIMEOUT, 60);
        $data = curl_exec($ch);
        $data = curl_getinfo($ch,CURLINFO_EFFECTIVE_URL );
        curl_close($ch);

        return $data;
    }

and also this:

function redirect($url) {
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_HEADER, true);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        $result = curl_exec($ch);
        if (preg_match('~Location: (.*)~i', $result, $match)) {
           $location = trim($match[1]);
        }

        return $result;
    }

But I couldn't find the redirected url.

  • 写回答

2条回答 默认 最新

  • douliudong8108 2019-03-02 07:45
    关注

    this page does not use a redirect-scheme that libcurl understands (it uses a html <meta http-equiv="REFRESH"-redirect, unsupported by libcurl), so libcurl can neither tell you where it is being redirected, nor can libcurl auto-follow the redirect (because libcurl does not understand it)

    you need to parse out the redirect url yourself from the HTML, eg

    function redirect1($url) { 
            $ch = curl_init($url);
            curl_setopt($ch, CURLOPT_HEADER, false);
            curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
            curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
            curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
            curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
            curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
            curl_setopt($ch, CURLOPT_CONNECTTIMEOUT ,0);
            curl_setopt($ch, CURLOPT_TIMEOUT, 60);
            $data = curl_exec($ch);
            $domd=@DOMDocument::loadHTML($data);
            $xp=new DOMXPath($domd);
            // <META http-equiv="REFRESH" content="0;URL=http://sumai.tokyu-land.co.jp/branz/roppongi4/?iad=daikyo" />
    
            $location=$xp->query("//meta[@http-equiv='REFRESH']")->item(0)->getAttribute("content");
            // 0;URL=http://sumai.tokyu-land.co.jp/branz/roppongi4/?iad=daikyo   
            $location=substr($location,stripos($location,'URL=')+4);
            curl_close($ch);
            return $location;
        }
        var_dump(redirect1('https://lions-mansion.jp/MA141070/'));
    

    output:

    C:\projects\misc>php re.php
    string(57) "http://sumai.tokyu-land.co.jp/branz/roppongi4/?iad=daikyo"
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥40 复杂的限制性的商函数处理
  • ¥15 程序不包含适用于入口点的静态Main方法
  • ¥15 素材场景中光线烘焙后灯光失效
  • ¥15 请教一下各位,为什么我这个没有实现模拟点击
  • ¥15 执行 virtuoso 命令后,界面没有,cadence 启动不起来
  • ¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
  • ¥20 有关区间dp的问题求解
  • ¥15 多电路系统共用电源的串扰问题
  • ¥15 slam rangenet++配置
  • ¥15 有没有研究水声通信方面的帮我改俩matlab代码