I know that curl does not execute javascript, it only grabs static html, so this is why a simple curl will not work for me. I do not know much about php, I'm new to this, but what I understand so for is that if I did not have to first login to grab the content I can simple use file_get_contents witch will first execute the dynamic content and then grab the html content, witch in return give me what I need, but I first have to login and then get the page. I tried to login using curl
$user = "myuser";
$pass = "mypassword";
//create cookie file
$random = rand(0,9999999);
$cookie = $random."cookie.txt";
$fp = fopen("$cookie","w") or die("<BR><B>Unable to open cookie file $cookie_file_path for write!<BR>");
fclose($fp);
//do login using curl
$LOGINURL = "https://controlpanel.example.com/index.html";
$agent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:29.0) Gecko/20120101 Firefox/29.0";
$v2 = array( 'userName'=>$user, 'password'=>$pass);
$reffer = "https://www.google.com";
//this first call is to set the cookie
$ch = curl_init();
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
curl_setopt($ch, CURLOPT_URL,$LOGINURL);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
ob_start(); // Prevent output
curl_exec ($ch);
ob_end_clean(); // End preventing output
curl_close ($ch);
unset($ch);
//now that the cookie is set, do login
$ch = curl_init();
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS,$v2);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie);
curl_setopt($ch, CURLOPT_URL,$LOGINURL);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_REFERER, $reffer);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
$result = curl_exec($ch);
//now we are logged-in
//now grab the page you need
$profileurl = 'https://controlpanel.example.com/information.html';
curl_setopt($ch, CURLOPT_URL, $profileurl);
curl_setopt($ch, CURLOPT_POST, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
$result = curl_exec ($ch);
But this will only get the static html, not the dynamic content too. Let me explain better. The code I get, at this point using above curl method, in $result is:
.....
<div id="DisplayAccountInfo"><span class="loading">Loading info</span></div>
.....
If I do this manually using firefox and inspect element with firebug the source is:
.....
<div id="DisplayAccountInfo">
<div class="formModule" id="formContainer">
......
<legend>Your code for this hour is 8T5D9LO</legend>
.....
</div>
</div>
.....
What I notice in firebug console is:
GET https://controlpanel.example.com/async/information.html
200 OK
669ms
jquery-....min.js (line 19)
What I, as a noob, understand from this is that the content is dinamicly loaded using jquery, and curl does not know how to do that.
I tried to put instead of
$profileurl = 'https://controlpanel.example.com/information.html';
curl_setopt($ch, CURLOPT_URL, $profileurl);
curl_setopt($ch, CURLOPT_POST, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
$result = curl_exec ($ch);
//replaced the above with this
$result = file_get_contents($profileurl);
but I get the html from login page because I think it does not recognize anymore that I'm logged in.
So how can I solve this? Can you please help me?