doucheng4094 2013-05-02 18:04
浏览 47
已采纳

加快cURL页面登录和抓取[重复]

This question already has an answer here:

I have a function that logins into a site and searches for a string in the following page. The process currently takes 10 seconds, but wanted to see if there was anything I could do to speed it up. I wonder if was possible to have the curl login persist over clients session or maybe search the document better.

public function curlLogin($url, $post_values, $cookieJar) {

        $timeout = 30;

        $curl_connection = curl_init();
        curl_setopt($curl_connection, CURLOPT_URL, $url);
        curl_setopt($curl_connection, CURLOPT_TIMEOUT, $timeout);
        curl_setopt($curl_connection, CURLOPT_USERAGENT,"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)");
        curl_setopt($curl_connection, CURLOPT_COOKIEJAR, $cookieJar);
        curl_setopt($curl_connection, CURLOPT_COOKIEFILE, $cookieJar);
        curl_setopt($curl_connection, CURLOPT_COOKIESESSION, 0);
        curl_setopt($curl_connection, CURLOPT_HEADER, 1);
        curl_setopt($curl_connection, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, 0);
        curl_setopt($curl_connection, CURLOPT_POST, 1);
        curl_setopt($curl_connection, CURLOPT_POSTFIELDS, $post_values);
        curl_setopt($curl_connection, CURLOPT_HTTPHEADER,
        array("Content-type: application/x-www-form-urlencoded"));
        curl_exec($curl_connection);
        return $curl_connection;

    }

    public function curlPost($curl_connection, $url, $post_values, $cookieJar) {

        $timeout = 30;

        curl_setopt($curl_connection, CURLOPT_URL, $url);
        curl_setopt($curl_connection, CURLOPT_TIMEOUT, $timeout);
        curl_setopt($curl_connection, CURLOPT_USERAGENT,"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)");
        curl_setopt($curl_connection, CURLOPT_COOKIEJAR, $cookieJar);
        curl_setopt($curl_connection, CURLOPT_COOKIEFILE, $cookieJar);
        curl_setopt($curl_connection, CURLOPT_COOKIESESSION, 0);
        curl_setopt($curl_connection, CURLOPT_HEADER, 1);
        curl_setopt($curl_connection, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, 0);
        curl_setopt($curl_connection, CURLOPT_POST, 1);
        curl_setopt($curl_connection, CURLOPT_POSTFIELDS, $post_values);
        curl_setopt($curl_connection, CURLOPT_HTTPHEADER,
        array("Content-type: application/x-www-form-urlencoded"));
        $result = curl_exec($curl_connection);
        return $result;

    }

$cookieJar = tempnam ("/tmp", "CURLCOOKIE");

$curl_connection = $this->curlLogin($login_url, $post_values, $cookieJar);

$result = $this->curlPost($curl_connection, $next_url, $params, $cookieJar);

if (strpos($result,'string 1') > 0) {
    $success = true;
    $message = 'string 1 is present';
}else if (strpos($result,'string 2') > 0){
    $success = false;
    $message = 'string 2 is present';
}else if (strpos($result,'string 3') > 0){
    $success = false;
    $message = 'string 3 is present';
}else{
    $success = false;
    $message = 'None of the above strings are present.';
}

curl_close($curl_connection);
unlink($cookieJar);
</div>
  • 写回答

1条回答 默认 最新

  • dongshan9338 2013-05-02 18:56
    关注

    You can avoid logging in every time by reusing your cookiejar.

    Create a file called cookies.txt in the directory containing your script and assign: $cookieJar = 'cookies.txt'.

    After running the script for the first time, simply remove call to the curlLogin() function and your curlPost() function should use the cookies correctly and return data as if you were logged in.

    Remember, CURLOPT_COOKIEFILE is to specify where to "read" cookies from and CURLOPT_COOKIEJAR is where you want the response cookies to be written.

    So you could probably do without CURLOPT_COOKIEJAR in your curlPost() function.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 sqlite 附加(attach database)加密数据库时,返回26是什么原因呢?
  • ¥88 找成都本地经验丰富懂小程序开发的技术大咖
  • ¥15 如何处理复杂数据表格的除法运算
  • ¥15 如何用stc8h1k08的片子做485数据透传的功能?(关键词-串口)
  • ¥15 有兄弟姐妹会用word插图功能制作类似citespace的图片吗?
  • ¥200 uniapp长期运行卡死问题解决
  • ¥15 latex怎么处理论文引理引用参考文献
  • ¥15 请教:如何用postman调用本地虚拟机区块链接上的合约?
  • ¥15 为什么使用javacv转封装rtsp为rtmp时出现如下问题:[h264 @ 000000004faf7500]no frame?
  • ¥15 乘性高斯噪声在深度学习网络中的应用