duancu4731 2019-02-22 05:48
浏览 141
已采纳

使用cURL抓取重定向的网址

I am trying to find where I'll be redirected at. So I tried to functions for this, but none of those are working properly.

the links is here. when you try to enter, you will be redirected:

https://lions-mansion.jp/MA141070/

so I tried use cURL,

function redirect1($url) { 
        $ch = curl_init($url);
        curl_setopt($ch, CURLOPT_HEADER, false);
        curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
        curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT ,0);
        curl_setopt($ch, CURLOPT_TIMEOUT, 60);
        $data = curl_exec($ch);
        $data = curl_getinfo($ch,CURLINFO_EFFECTIVE_URL );
        curl_close($ch);

        return $data;
    }

and also this:

function redirect($url) {
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_HEADER, true);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        $result = curl_exec($ch);
        if (preg_match('~Location: (.*)~i', $result, $match)) {
           $location = trim($match[1]);
        }

        return $result;
    }

But I couldn't find the redirected url.

  • 写回答

2条回答 默认 最新

  • douliudong8108 2019-03-02 07:45
    关注

    this page does not use a redirect-scheme that libcurl understands (it uses a html <meta http-equiv="REFRESH"-redirect, unsupported by libcurl), so libcurl can neither tell you where it is being redirected, nor can libcurl auto-follow the redirect (because libcurl does not understand it)

    you need to parse out the redirect url yourself from the HTML, eg

    function redirect1($url) { 
            $ch = curl_init($url);
            curl_setopt($ch, CURLOPT_HEADER, false);
            curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
            curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
            curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
            curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
            curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
            curl_setopt($ch, CURLOPT_CONNECTTIMEOUT ,0);
            curl_setopt($ch, CURLOPT_TIMEOUT, 60);
            $data = curl_exec($ch);
            $domd=@DOMDocument::loadHTML($data);
            $xp=new DOMXPath($domd);
            // <META http-equiv="REFRESH" content="0;URL=http://sumai.tokyu-land.co.jp/branz/roppongi4/?iad=daikyo" />
    
            $location=$xp->query("//meta[@http-equiv='REFRESH']")->item(0)->getAttribute("content");
            // 0;URL=http://sumai.tokyu-land.co.jp/branz/roppongi4/?iad=daikyo   
            $location=substr($location,stripos($location,'URL=')+4);
            curl_close($ch);
            return $location;
        }
        var_dump(redirect1('https://lions-mansion.jp/MA141070/'));
    

    output:

    C:\projects\misc>php re.php
    string(57) "http://sumai.tokyu-land.co.jp/branz/roppongi4/?iad=daikyo"
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 求全国交通咨询模拟代码,要求如下,可以完全在dev c++运行
  • ¥15 根据要求修改程序编码
  • ¥15 用 Python 做一个用 Excel 表导入的答题系统
  • ¥15 使用微信开发者工具实现一个“婚博会”小程序
  • ¥15 ros的rviz仿真机器人
  • ¥15 关于#linux#的问题(输入输出错误):出现这个界面接着我重新装系统,又让修电脑的师傅帮我扫描硬盘(没有问题)用着用着又卡死(相关搜索:固态硬盘)
  • ¥15 cv::resize不同线程时间不同
  • ¥15 web课程,怎么做啊😭没好好听课 根本不知道怎么下手
  • ¥15 做一个关于单片机的比较难的代码,然后搞一个PPT进行解释
  • ¥15 python提取.csv文件中的链接会经常出现爬取失败