duancu4731 2019-02-22 05:48
浏览 142
已采纳

使用cURL抓取重定向的网址

I am trying to find where I'll be redirected at. So I tried to functions for this, but none of those are working properly.

the links is here. when you try to enter, you will be redirected:

https://lions-mansion.jp/MA141070/

so I tried use cURL,

function redirect1($url) { 
        $ch = curl_init($url);
        curl_setopt($ch, CURLOPT_HEADER, false);
        curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
        curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT ,0);
        curl_setopt($ch, CURLOPT_TIMEOUT, 60);
        $data = curl_exec($ch);
        $data = curl_getinfo($ch,CURLINFO_EFFECTIVE_URL );
        curl_close($ch);

        return $data;
    }

and also this:

function redirect($url) {
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_HEADER, true);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        $result = curl_exec($ch);
        if (preg_match('~Location: (.*)~i', $result, $match)) {
           $location = trim($match[1]);
        }

        return $result;
    }

But I couldn't find the redirected url.

  • 写回答

2条回答 默认 最新

  • douliudong8108 2019-03-02 07:45
    关注

    this page does not use a redirect-scheme that libcurl understands (it uses a html <meta http-equiv="REFRESH"-redirect, unsupported by libcurl), so libcurl can neither tell you where it is being redirected, nor can libcurl auto-follow the redirect (because libcurl does not understand it)

    you need to parse out the redirect url yourself from the HTML, eg

    function redirect1($url) { 
            $ch = curl_init($url);
            curl_setopt($ch, CURLOPT_HEADER, false);
            curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
            curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
            curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
            curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
            curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
            curl_setopt($ch, CURLOPT_CONNECTTIMEOUT ,0);
            curl_setopt($ch, CURLOPT_TIMEOUT, 60);
            $data = curl_exec($ch);
            $domd=@DOMDocument::loadHTML($data);
            $xp=new DOMXPath($domd);
            // <META http-equiv="REFRESH" content="0;URL=http://sumai.tokyu-land.co.jp/branz/roppongi4/?iad=daikyo" />
    
            $location=$xp->query("//meta[@http-equiv='REFRESH']")->item(0)->getAttribute("content");
            // 0;URL=http://sumai.tokyu-land.co.jp/branz/roppongi4/?iad=daikyo   
            $location=substr($location,stripos($location,'URL=')+4);
            curl_close($ch);
            return $location;
        }
        var_dump(redirect1('https://lions-mansion.jp/MA141070/'));
    

    output:

    C:\projects\misc>php re.php
    string(57) "http://sumai.tokyu-land.co.jp/branz/roppongi4/?iad=daikyo"
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 在若依框架下实现人脸识别
  • ¥15 网络科学导论,网络控制
  • ¥100 安卓tv程序连接SQLSERVER2008问题
  • ¥15 利用Sentinel-2和Landsat8做一个水库的长时序NDVI的对比,为什么Snetinel-2计算的结果最小值特别小,而Lansat8就很平均
  • ¥15 metadata提取的PDF元数据,如何转换为一个Excel
  • ¥15 关于arduino编程toCharArray()函数的使用
  • ¥100 vc++混合CEF采用CLR方式编译报错
  • ¥15 coze 的插件输入飞书多维表格 app_token 后一直显示错误,如何解决?
  • ¥15 vite+vue3+plyr播放本地public文件夹下视频无法加载
  • ¥15 c#逐行读取txt文本,但是每一行里面数据之间空格数量不同