duanjue2560 2015-08-30 20:44
浏览 23

抓第二页答案; 卷曲

I'm crawling a few websites, everything it's working fine,

but .... I have one specific website that I'm trying to crawl, and it's making a few "redirects" before landing to the web I want.

So it's something like ...

http://www.example.com/?day=01/01/2016&action=search_prices

this will go to http://www.example.com/search/default.aspx take a few seconds to search the answer page and then show it on there.

Is there any way to easily do this? any hint, clue, etc would be awesome

Simple code right now (almost all the sites I was crawling were jsons):

function get_web_page( $url ){
        $options = array(
            CURLOPT_RETURNTRANSFER => true,     // return web page
            CURLOPT_HEADER         => false,    // don't return headers
            CURLOPT_FOLLOWLOCATION => true,     // follow redirects
            CURLOPT_ENCODING       => "",       // handle all encodings
            CURLOPT_USERAGENT      => "spider", // who am i
            CURLOPT_AUTOREFERER    => true,     // set referer on redirect
            CURLOPT_CONNECTTIMEOUT => 120,      // timeout on connect

            CURLOPT_HTTPHEADER     => array('HeaderName: HeaderValue'),

            CURLOPT_TIMEOUT        => 120,      // timeout on response
            CURLOPT_MAXREDIRS      => 10,       // stop after 10 redirects
            CURLOPT_SSL_VERIFYPEER => false     // Disabled SSL Cert checks
        );

        $ch      = curl_init( $url );
        curl_setopt_array( $ch, $options );
        $content = curl_exec( $ch );
        $err     = curl_errno( $ch );
        $errmsg  = curl_error( $ch );
        $header  = curl_getinfo( $ch );
        curl_close( $ch );

        $header['errno']   = $err;
        $header['errmsg']  = $errmsg;
        $header['content'] = $content;
        return $header;
}
  • 写回答

0条回答 默认 最新

    报告相同问题?

    悬赏问题

    • ¥100 任意维数的K均值聚类
    • ¥15 stamps做sbas-insar,时序沉降图怎么画
    • ¥15 unity第一人称射击小游戏,有demo,在原脚本的基础上进行修改以达到要求
    • ¥15 买了个传感器,根据商家发的代码和步骤使用但是代码报错了不会改,有没有人可以看看
    • ¥15 关于#Java#的问题,如何解决?
    • ¥15 加热介质是液体,换热器壳侧导热系数和总的导热系数怎么算
    • ¥100 嵌入式系统基于PIC16F882和热敏电阻的数字温度计
    • ¥15 cmd cl 0x000007b
    • ¥20 BAPI_PR_CHANGE how to add account assignment information for service line
    • ¥500 火焰左右视图、视差(基于双目相机)