I'm about to scrape a Website with several Tabs. Up on each Tab click an AJAX-Request gets send to their server returning the data of the Tab which will be displayed.
Since I need to fetch those Data I checked the HTTP-Requests and manipulated the Header with "hurl.it"(website) to check the response. I'm receiving the correct results but when i set up my Curl Session with the same Header the response is not the same/readable.
With the Live HTTP Headers Add On I was able to extract the AJAX - URL
Request
POST http://xxxx.xxx.xx/Organisation/AjaxScopeQualification/0e69a479-63e3-4d64-9340-f2e9cc8d84df?tabIndex=3
HEADERS
Content-Type: application/xml
X-Requested-With: XMLHttpRequest
Referer: http://xxxx.xxx.xx/Organisation/Details/41283
Response via hurl.it
200 OK 646 bytes 547 ms
HEADERS
Cache-Control: private
Content-Encoding: gzip
Content-Length: 382
Content-Type: application/json; charset=utf-8
Date: Fri, 29 Jan 2016 01:36:42 GMT
Server: Microsoft-IIS/7.5
Set-Cookie: .ASPXANONYMOUS=fsbx3gX1CykkKL2OIvPFH9GcPj97KEPkK-6WVTA24eI87k0F3gjpt0fyVA2P90S8heeaoqjUps9-UFtzgm8mRAiPqnbS50kytk_NY5K4yHPwa-5l0kCqNzPAo0yjBsPmbisbg3N7P7h6Oz5EdRaN8Fkr0y3G6wdIILI8yMQBj1S1X4GULf9rpQ8IvvSo13KB0; expires=Fri, 29-Jan-2016 03:36:42 GMT; path=/; HttpOnly
X-Aspnet-Version: 4.0.30319
X-Aspnetmvc-Version: 3.0
X-Powered-By: ASP.NET
BODY
{"data":[ {"Id":"9fe29051-31e6-4bfa-a2f1-194d70c0aab9","NrtId":"930ec525-2199-44a9-bc27-c1b28524c9bf","RtoId":"0e69a479-63e3-4d64-9340-f2e9cc8d84df","TrainingComponentType":2,"Code":"TLI41210","Title":"Certificate IV in Transport and Logistics (Road Transport - Car Driving Instruction)","IsImplicit":false,"ExtentId":"01","Extent":"Deliver and assess","StartDate":new Date(2011,11,7,0,0,0),"EndDate":new Date(2016,11,6,0,0,0),"DeliveryNsw":true,"DeliveryVic":true,"DeliveryQld":true,"DeliverySa":true,"DeliveryWa":true,"DeliveryTas":true,"DeliveryNt":true,"DeliveryAct":true,"ScopeDecisionType":0,"ScopeDecision":"Deliver and assess"}],"total":1}
**Response from CURL - var_dump() **
string(382) "�m��j�0�_E蔀����|+�=�B�Kz(=��q8���ICȻWζiq�t��������{ ����y�r;��r�D���@��P���t����Ǚ.�Z������ZaX�;�N�z����~(�[Jor��������7F��H1h������E~�!����aJ#��'䭮�>���Mg�Vr��Ǚ��ȊK�S��A��&݇L�evu���Sl3;�ᱴd]�4�pR�.�]��1�@�`�X��?��ty����p�8����1�R=�t(S�6�[�+-����Vr9��#���f�4���������2#�Ew��їѯ� ���r��FGZ�O��\���.䲰�7���f^�W���[��;Z���"
Is that a charset problem or am I setting my Curl Options wrong?
CURL
$url = http://xxxx.xxx.xx/Organisation/AjaxDetailsLoadScope/e11d03e7-37e7-49e8-be54-0bed8eb1c247?_=1454029562507&tabIndex=3
$header = array(
'Accept: */*',
'Accept-Encoding: gzip, deflate',
'Content-Length: 0',
'Content-Type: application/xml',
'X-Requested-With: XMLHttpRequest',
"Referer: http://xxxx.xxx.xx/Organisation/Details/$this->code"
);
//..
//$header and $url are saved in arrays and then passed to curlMulti()
function curlMulti($urls, $headers = false) {
$mh = curl_multi_init();
// For each of the URLs in array
foreach ($urls as $id => $d) {
$ch[$id] = curl_init();
$url = (is_array($d) && !empty($d['url'])) ? $d['url'] : $d;
if (is_array($headers) && $headers[$id] != false) {
curl_setopt($ch[$id], CURLOPT_POST, 1);
curl_setopt($ch[$id], CURLOPT_HTTPHEADER, $headers[$id]);
}
curl_setopt($ch[$id], CURLOPT_URL, $url);
curl_setopt($ch[$id], CURLOPT_RETURNTRANSFER, TRUE);
curl_multi_add_handle($mh, $ch[$id]);
}
$running = NULL; // Set $running to NULL
do {
curl_multi_exec($mh, $running);
} while ($running > 0); // While $running is greater than zero
foreach ($ch as $id => $content) {
$results[$id] = curl_multi_getcontent($content);
curl_multi_remove_handle($mh, $content);
}
curl_multi_close($mh);
return $results;
}