dongpo7467 2014-08-28 10:27
浏览 46
已采纳

使用curl登录后获取动态生成的内容

I know that curl does not execute javascript, it only grabs static html, so this is why a simple curl will not work for me. I do not know much about php, I'm new to this, but what I understand so for is that if I did not have to first login to grab the content I can simple use file_get_contents witch will first execute the dynamic content and then grab the html content, witch in return give me what I need, but I first have to login and then get the page. I tried to login using curl

$user = "myuser";
$pass = "mypassword";

//create cookie file
$random = rand(0,9999999);
$cookie = $random."cookie.txt";
$fp = fopen("$cookie","w") or die("<BR><B>Unable to open cookie file $cookie_file_path for write!<BR>");
fclose($fp);

//do login using curl
$LOGINURL = "https://controlpanel.example.com/index.html";
$agent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:29.0) Gecko/20120101 Firefox/29.0";
$v2 = array( 'userName'=>$user, 'password'=>$pass);
$reffer = "https://www.google.com";
//this first call is to set the cookie
$ch = curl_init(); 
    curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
    curl_setopt($ch, CURLOPT_URL,$LOGINURL);
    curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    curl_setopt($ch, CURLOPT_USERAGENT, $agent);
ob_start();      // Prevent output
curl_exec ($ch);
ob_end_clean();  // End preventing output
curl_close ($ch);
unset($ch);
//now that the cookie is set, do login
$ch = curl_init();
    curl_setopt($ch, CURLOPT_POST, true); 
    curl_setopt($ch, CURLOPT_POSTFIELDS,$v2); 
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 
    curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie);
    curl_setopt($ch, CURLOPT_URL,$LOGINURL);
    curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_REFERER, $reffer);
    curl_setopt($ch, CURLOPT_USERAGENT, $agent);

$result = curl_exec($ch);

//now we are logged-in
//now grab the page you need

$profileurl = 'https://controlpanel.example.com/information.html';
curl_setopt($ch, CURLOPT_URL, $profileurl);
curl_setopt($ch, CURLOPT_POST, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);

$result = curl_exec ($ch);

But this will only get the static html, not the dynamic content too. Let me explain better. The code I get, at this point using above curl method, in $result is:

.....
<div id="DisplayAccountInfo"><span class="loading">Loading info</span></div>
.....

If I do this manually using firefox and inspect element with firebug the source is:

.....
<div id="DisplayAccountInfo">
  <div class="formModule" id="formContainer">
    ......
       <legend>Your code for this hour is 8T5D9LO</legend>
    .....
  </div>
</div>
.....

What I notice in firebug console is:

GET https://controlpanel.example.com/async/information.html

200 OK
        669ms   
jquery-....min.js (line 19)

What I, as a noob, understand from this is that the content is dinamicly loaded using jquery, and curl does not know how to do that.

I tried to put instead of

$profileurl = 'https://controlpanel.example.com/information.html';
curl_setopt($ch, CURLOPT_URL, $profileurl);
curl_setopt($ch, CURLOPT_POST, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);

$result = curl_exec ($ch);

//replaced the above with this
$result = file_get_contents($profileurl);

but I get the html from login page because I think it does not recognize anymore that I'm logged in.

So how can I solve this? Can you please help me?

  • 写回答

2条回答 默认 最新

  • dt2015 2014-08-28 20:05
    关注

    haha, so easy it did not cross my mind. For me it is simple, I did not have to call

    https://controlpanel.example.com/information.html

    but

    https://controlpanel.example.com/async/information.html

    to get the div I wanted :)

    Lucky for me I noticed the get function in firebug :)

    So the cod now is :

    $user = "myuser";
    $pass = "mypassword";
    
    //create cookie file
    $random = rand(0,9999999);
    $cookie = $random."cookie.txt";
    $fp = fopen("$cookie","w") or die("<BR><B>Unable to open cookie file $cookie for write!<BR>");
    fclose($fp);
    
    //do login using curl
    $LOGINURL = "https://controlpanel.example.com/index.html";
    $agent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:29.0) Gecko/20120101 Firefox/29.0";
    $v2 = array( 'userName'=>$user, 'password'=>$pass);
    $reffer = "https://www.google.com";
    //this first call is to set the cookie
    
    $ch = curl_init(); 
        curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
        curl_setopt($ch, CURLOPT_URL,$LOGINURL);
        curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
        curl_setopt($ch, CURLOPT_USERAGENT, $agent);
    ob_start();      // Prevent output
    curl_exec ($ch);
    ob_end_clean();  // End preventing output
    curl_close ($ch);
    unset($ch);
    
    //now that the cookie is set, do login
    $ch = curl_init();
        curl_setopt($ch, CURLOPT_POST, true); 
        curl_setopt($ch, CURLOPT_POSTFIELDS,$v2); 
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 
        curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie);
        curl_setopt($ch, CURLOPT_URL,$LOGINURL);
        curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
        curl_setopt($ch, CURLOPT_REFERER, $reffer);
        curl_setopt($ch, CURLOPT_USERAGENT, $agent);
    
    $result = curl_exec($ch);
    
    //now we are logged-in
    //now grab the page you need
    
    $profileurl = 'https://controlpanel.example.com/async/information.html';
    curl_setopt($ch, CURLOPT_URL, $profileurl);
    curl_setopt($ch, CURLOPT_POST, 0);
    
    $result = curl_exec ($ch);
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥100 set_link_state
  • ¥15 虚幻5 UE美术毛发渲染
  • ¥15 CVRP 图论 物流运输优化
  • ¥15 Tableau online 嵌入ppt失败
  • ¥100 支付宝网页转账系统不识别账号
  • ¥15 基于单片机的靶位控制系统
  • ¥15 真我手机蓝牙传输进度消息被关闭了,怎么打开?(关键词-消息通知)
  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度