douqiju2520 2015-01-06 09:24
浏览 42

关于php curl和curl命令

Php curl returning blank page for some sites where curl command working fine.

for example : curl www.wikipedia.org generating output but php curl giving blank pages with <html> tags

$ch = curl_init(); // initialize curl with given url

//TBD: all setopt commands return true/false. Should be handled

//curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER["HTTP_USER_AGENT"]); // set  useragent

$res = curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // write the response to a variable

curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // follow redirects if any

curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); // max. seconds to execute

curl_setopt($ch, CURLOPT_FAILONERROR, 0); // stop when it encounters an error

curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // iNetClean is web-crawling, no need to verify certificates

curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);

curl_setopt($ch, CURLOPT_ENCODING, ""); // accept all encodings - identity, deflate, gzip
// now set the URL

curl_setopt($ch, CURLOPT_URL, $url);

$fp_out = fopen($html_file, 'w');
if (!$fp_out) {
    if ($DEBUG) {
        error_log("couldn't create " . $html_file);
    }
} else {
    if ($DEBUG) {
        error_log("file created " . $html_file);
    }
}

$fp_err = fopen($html_err_file, 'w');
if (!$fp_err) {
    if ($DEBUG) {
        error_log("couldn't create " . $html_err_file);
    }
} else {
    if ($DEBUG) {
        error_log("error file created " . $html_err_file);
    }
}

curl_setopt($ch, CURLOPT_FILE, $fp_out); //rawurlencode($url) . "txt"); // for debugging only
curl_setopt($ch, CURLOPT_STDERR, $fp_err);
$result = curl_exec($ch);

//0 size file is created if no data is downloaded or URL does not exist such as pron00.com. Hence added handler to such errors.
if (@filesize($html_file) > 0) {
    //file exists and contain some data
} else {
    return false;
}
if ($result == false) {
    trigger_error(curl_error($ch));
    if ($DEBUG) {
        error_log("Curl_exec fail");
    }
    return false;
}

fclose($fp_out);
fclose($fp_err);
curl_close($ch);
return $result;
  • 写回答

1条回答 默认 最新

  • doucong4535 2015-01-06 09:32
    关注

    You can do this using below code:

    <?php
    
    $debug = 1;
    $fb_page_url = "http://www.wikipedia.org";
    $cookies = 'cookies.txt';
    touch($cookies);
    $uagent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/36.0.1985.125 Chrome/36.0.1985.125 Safari/537.36';
    
    
    /**
        Get __VIEWSTATE & __EVENTVALIDATION
     */
    $ch = curl_init($fb_page_url);
    curl_setopt($ch, CURLOPT_COOKIEJAR, $cookies);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_USERAGENT, $uagent);
    
    $html = curl_exec($ch);
    
    curl_close($ch);
    
    preg_match('~<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="(.*?)" />~', $html, $viewstate);
    preg_match('~<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="(.*?)" />~', $html, $eventValidation);
    
    $viewstate = $viewstate[1];
    $eventValidation = $eventValidation[1];
    
    
    
    /**
     Start Fetching process
     */
    $ch = curl_init();
    
    curl_setopt($ch, CURLOPT_URL, $fb_page_url);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_COOKIEJAR, $cookies);
    curl_setopt($ch, CURLOPT_COOKIEFILE, $cookies);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 0);
    curl_setopt($ch, CURLOPT_TIMEOUT, 9850);
    curl_setopt($ch, CURLOPT_USERAGENT, $uagent);
    
    // Collecting all POST fields
    $postfields = array();
    //$postfields['__EVENTTARGET'] = ""; //this is for further clicking any link
    //$postfields['__EVENTARGUMENT'] = ""; //this is for further clicking any link
    $postfields['__LASTFOCUS'] = "";
    $postfields['__VIEWSTATE'] = $viewstate;
    $postfields['__EVENTVALIDATION'] = $eventValidation;
    $postfields['hidStates'] = "";
    
    curl_setopt($ch, CURLOPT_POST, 1);
    curl_setopt($ch, CURLOPT_POSTFIELDS, $postfields);
    $ret = curl_exec($ch); // Get result as fetched web page.
    
        if ($debug) {
            echo $ret;
        }
        curl_close($ch);
    ?>
    
    评论

报告相同问题?

悬赏问题

  • ¥100 关于使用MATLAB中copularnd函数的问题
  • ¥20 在虚拟机的pycharm上
  • ¥15 jupyterthemes 设置完毕后没有效果
  • ¥15 matlab图像高斯低通滤波
  • ¥15 针对曲面部件的制孔路径规划,大家有什么思路吗
  • ¥15 钢筋实图交点识别,机器视觉代码
  • ¥15 如何在Linux系统中,但是在window系统上idea里面可以正常运行?(相关搜索:jar包)
  • ¥50 400g qsfp 光模块iphy方案
  • ¥15 两块ADC0804用proteus仿真时,出现异常
  • ¥15 关于风控系统,如何去选择