dongzhi9574 2019-07-13 08:11
浏览 113

使用cURL绕过php allow_url_fopen = 0标志的预防

I have configured allow_url_fopen=0 to prevent scrapping tools. The configuration is done on global mode and I'm not allowing an override to the local php.ini file. However, I've noted that the flag can be bypassed if the scrapping tool is based on cURL. Look at the given page copier function below, I successfully copied the page from the server having configuration allow_url_fopen=0 using the given function.

public function handle()
{
    try{
        if( ini_get('allow_url_fopen') ) {
            Log::info('Flag allow_url_fopen is enabled');
            $html = new Htmldom('page_url_here');
        } else {
            Log::info('Flag allow_url_fopen is disabled trying with cURL');
            $webpage = EventCron::get_web_page('page_url_here');
            $html = new Htmldom($webpage['content']);
        }
        /*Doing some magical stuff with the site content */
        $agenda = $html->find('div.articles' , 0);

        Log::info('success');
    }catch(\Exception $e){
        Log::error('Event Cron Error '.$e->getMessage());
    }
}

public static function get_web_page( $url, $cookiesIn = '' ){
    $options = array(
        CURLOPT_RETURNTRANSFER => true,     
        CURLOPT_HEADER         => true,    
        CURLOPT_FOLLOWLOCATION => true,
        CURLOPT_ENCODING       => "",
        CURLOPT_AUTOREFERER    => true,
        CURLOPT_CONNECTTIMEOUT => 120,
        CURLOPT_TIMEOUT        => 120,
        CURLOPT_MAXREDIRS      => 10,
        CURLINFO_HEADER_OUT    => true,
        CURLOPT_SSL_VERIFYPEER => true,
        CURLOPT_HTTP_VERSION   => CURL_HTTP_VERSION_1_1,
        CURLOPT_COOKIE         => $cookiesIn
    );

    $ch = curl_init( $url );
    curl_setopt_array( $ch, $options );
    $rough_content = curl_exec( $ch );
    $err = curl_errno( $ch );
    $errmsg = curl_error( $ch );
    $header = curl_getinfo( $ch );
    curl_close( $ch );

    $header_content = substr($rough_content, 0, $header['header_size']);
    $body_content = trim(str_replace($header_content, '', $rough_content));
    $pattern = "#Set-Cookie:\\s+(?<cookie>[^=]+=[^;]+)#m"; 
    preg_match_all($pattern, $header_content, $matches); 
    $cookiesOut = implode("; ", $matches['cookie']);

    $page['errno'] = $err;
    $page['errmsg'] = $errmsg;
    $page['headers'] = $header_content;
    $page['content'] = $body_content;
    $page['cookies'] = $cookiesOut;
    return $page;
}

Now the question is, how to prevent the page being cronned/scrapped? if there is no such thing allow us to do so, probably, it's a security issue in PHP. I found an alternative to prevent this from being happened by disabling the cURL library but it's not the proper solution. Some of my hosted projects require the library cURL as it's most used one and popular among web developers.

  • 写回答

0条回答 默认 最新

    报告相同问题?

    悬赏问题

    • ¥15 求差集那个函数有问题,有无佬可以解决
    • ¥15 【提问】基于Invest的水源涵养
    • ¥20 微信网友居然可以通过vx号找到我绑的手机号
    • ¥15 寻一个支付宝扫码远程授权登录的软件助手app
    • ¥15 解riccati方程组
    • ¥15 display:none;样式在嵌套结构中的已设置了display样式的元素上不起作用?
    • ¥15 使用rabbitMQ 消息队列作为url源进行多线程爬取时,总有几个url没有处理的问题。
    • ¥15 Ubuntu在安装序列比对软件STAR时出现报错如何解决
    • ¥50 树莓派安卓APK系统签名
    • ¥65 汇编语言除法溢出问题