dongtan3306 2019-03-11 08:14
浏览 76

使用php curl刮擦安全页面?

Trying to scrape a page with php curl. But I receive time out error every time I hit that URL. The URL which I am trying, it opens in browser but not through my php curl request.

My request is identical to the one which is passed in browser. I use Burp suite to get the request and response information. I also set the header which is required.

I am assuming that it is happening due to my server ip is different that what is being expected from that URL.

Could anyone let me know why this could happen. I don't have great knowledge about networking. Hence struggling with scrapping the page.

Additionally I would like to let you know guys that this URL changes its URL after successful on load using JavaScript. For example: http://example.tld/page?p1=234&p2=532 becomes http://example.tld/api/page. I want to know if this could be the reason or it is Server IP OR something else.

Below is the code which I am trying.

function get_web_page( $url ){
    $options = array(
        CURLOPT_RETURNTRANSFER => true,     // return web page
        CURLOPT_HEADER         => true,    // don't return headers
        CURLOPT_FOLLOWLOCATION => false,     // follow redirects
        CURLOPT_ENCODING       => "",       // handle all encodings
        CURLOPT_USERAGENT      => "Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Mobile Safari/537.36", // who am i
        CURLOPT_AUTOREFERER    => true,     // set referer on redirect
        CURLOPT_CONNECTTIMEOUT => 30,      // timeout on connect

        CURLOPT_HTTPHEADER     => array(
            "Pragma: no-cache",
            "Cache-Control: no-cache",
            "Upgrade-Insecure-Requests: 1",
            "User-Agent: Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Mobile Safari/537.36",
            "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
            "Accept-Encoding: gzip, deflate",
            "Accept-Language: en-US,en;q=0.9",
            "Cookie: JSESSIONID=0C072792B81AAAC43110DE7106E4F30C", 
            "Connection: close",
        ),
        CURLOPT_TIMEOUT        => 30,      // timeout on response
        CURLOPT_MAXREDIRS      => 10,       // stop after 10 redirects
        CURLOPT_SSL_VERIFYPEER => false,    // Disabled SSL Cert checks
    );
    /*$last_url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);.*/
    $ch      = curl_init( $url );
    curl_setopt_array( $ch, $options );
    $content = curl_exec( $ch );
    $err     = curl_errno( $ch );
    $errmsg  = curl_error( $ch );
    $header  = curl_getinfo( $ch );
    curl_close( $ch );

    $header['errno']   = $err;    
    $header['errmsg']  = $errmsg;
    $header['content'] = $content;
    return $header;
}
  • 写回答

0条回答 默认 最新

    报告相同问题?

    悬赏问题

    • ¥15 c语言怎么用printf(“\b \b”)与getch()实现黑框里写入与删除?
    • ¥20 怎么用dlib库的算法识别小麦病虫害
    • ¥15 华为ensp模拟器中S5700交换机在配置过程中老是反复重启
    • ¥15 java写代码遇到问题,求帮助
    • ¥15 uniapp uview http 如何实现统一的请求异常信息提示?
    • ¥15 有了解d3和topogram.js库的吗?有偿请教
    • ¥100 任意维数的K均值聚类
    • ¥15 stamps做sbas-insar,时序沉降图怎么画
    • ¥15 买了个传感器,根据商家发的代码和步骤使用但是代码报错了不会改,有没有人可以看看
    • ¥15 关于#Java#的问题,如何解决?