dr637349 2016-01-29 02:14
浏览 319

CURL Ajax - XML响应 - Charset

I'm about to scrape a Website with several Tabs. Up on each Tab click an AJAX-Request gets send to their server returning the data of the Tab which will be displayed.

Since I need to fetch those Data I checked the HTTP-Requests and manipulated the Header with "hurl.it"(website) to check the response. I'm receiving the correct results but when i set up my Curl Session with the same Header the response is not the same/readable.

With the Live HTTP Headers Add On I was able to extract the AJAX - URL

Request
POST http://xxxx.xxx.xx/Organisation/AjaxScopeQualification/0e69a479-63e3-4d64-9340-f2e9cc8d84df?tabIndex=3

HEADERS
Content-Type: application/xml
X-Requested-With: XMLHttpRequest
Referer: http://xxxx.xxx.xx/Organisation/Details/41283

Response via hurl.it
200 OK 646 bytes 547 ms

HEADERS

Cache-Control: private
Content-Encoding: gzip
Content-Length: 382
Content-Type: application/json; charset=utf-8
Date: Fri, 29 Jan 2016 01:36:42 GMT
Server: Microsoft-IIS/7.5
Set-Cookie: .ASPXANONYMOUS=fsbx3gX1CykkKL2OIvPFH9GcPj97KEPkK-6WVTA24eI87k0F3gjpt0fyVA2P90S8heeaoqjUps9-UFtzgm8mRAiPqnbS50kytk_NY5K4yHPwa-5l0kCqNzPAo0yjBsPmbisbg3N7P7h6Oz5EdRaN8Fkr0y3G6wdIILI8yMQBj1S1X4GULf9rpQ8IvvSo13KB0; expires=Fri, 29-Jan-2016 03:36:42 GMT; path=/; HttpOnly
X-Aspnet-Version: 4.0.30319
X-Aspnetmvc-Version: 3.0
X-Powered-By: ASP.NET
BODY

{"data":[ {"Id":"9fe29051-31e6-4bfa-a2f1-194d70c0aab9","NrtId":"930ec525-2199-44a9-bc27-c1b28524c9bf","RtoId":"0e69a479-63e3-4d64-9340-f2e9cc8d84df","TrainingComponentType":2,"Code":"TLI41210","Title":"Certificate IV in Transport and Logistics (Road Transport - Car Driving Instruction)","IsImplicit":false,"ExtentId":"01","Extent":"Deliver and assess","StartDate":new Date(2011,11,7,0,0,0),"EndDate":new Date(2016,11,6,0,0,0),"DeliveryNsw":true,"DeliveryVic":true,"DeliveryQld":true,"DeliverySa":true,"DeliveryWa":true,"DeliveryTas":true,"DeliveryNt":true,"DeliveryAct":true,"ScopeDecisionType":0,"ScopeDecision":"Deliver and assess"}],"total":1}

**Response from CURL - var_dump() **
string(382) "�m��j�0�_E蔀����|+�=�B�Kz(=��q8���ICȻWζiq�t��������{ ����y�r;��r�D���@��P���t����Ǚ.�Z������ZaX�;�N�z����~(�[Jor��������7F��H1h������E~�!����aJ#��'䭮�>���Mg�Vr��Ǚ��ȊK�S��A��&݇L�evu���Sl3;�ᱴd]�4�pR�.�]��1�@�`�X��?��ty����p�8����1�R=�t(S�6�[�+-����Vr9��#���f�4���������2#�Ew��їѯ� ���r��FGZ�O��\���.䲰�7���f^�W���[��;Z���"



Is that a charset problem or am I setting my Curl Options wrong?

CURL

$url = http://xxxx.xxx.xx/Organisation/AjaxDetailsLoadScope/e11d03e7-37e7-49e8-be54-0bed8eb1c247?_=1454029562507&tabIndex=3
$header = array(
        'Accept: */*',
        'Accept-Encoding: gzip, deflate',
        'Content-Length: 0',
        'Content-Type: application/xml',
        'X-Requested-With: XMLHttpRequest',
        "Referer: http://xxxx.xxx.xx/Organisation/Details/$this->code"
    );

//.. 
//$header and $url are saved in arrays and then passed to curlMulti()

function curlMulti($urls, $headers = false) {
    $mh = curl_multi_init(); 
    // For each of the URLs in array
    foreach ($urls as $id => $d) {
        $ch[$id] = curl_init();
        $url = (is_array($d) && !empty($d['url'])) ? $d['url'] : $d;

        if (is_array($headers) && $headers[$id] != false) {
            curl_setopt($ch[$id], CURLOPT_POST, 1);
            curl_setopt($ch[$id], CURLOPT_HTTPHEADER, $headers[$id]);
        }


        curl_setopt($ch[$id], CURLOPT_URL, $url);
        curl_setopt($ch[$id], CURLOPT_RETURNTRANSFER, TRUE);
        curl_multi_add_handle($mh, $ch[$id]); 
    }
    $running = NULL; // Set $running to NULL
    do {
        curl_multi_exec($mh, $running);
    } while ($running > 0); // While $running is greater than zero

    foreach ($ch as $id => $content) {
        $results[$id] = curl_multi_getcontent($content); 
        curl_multi_remove_handle($mh, $content);  
    }
    curl_multi_close($mh); 
    return $results; 
}
  • 写回答

1条回答 默认 最新

  • duanqiao2006 2016-01-29 03:45
    关注

    I was playing a little bit around with the Headers and got it working now..

    had to delete 'Accept: */*', 'Accept-Encoding: gzip, deflate' in the header

    $header = array(
        'Content-Length: 0',
        'Content-Type: application/xml',
        'X-Requested-With: XMLHttpRequest',
        "Referer: http://xxxx.xxx.xx/Organisation/Details/$this->code"
    );
    

    works like a charm:

    stdClass Object
    (
    [data] => Array
        (
            [0] => stdClass Object
                (
                    [Id] => 9fe29051-31e6-4bfa-a2f1-194d70c0aab9
                    [NrtId] => 930ec525-2199-44a9-bc27-c1b28524c9bf
                    [RtoId] => 0e69a479-63e3-4d64-9340-f2e9cc8d84df
                    [TrainingComponentType] => 2
                    [Code] => TLI41210
                    [Title] => Certificate IV in Transport and Logistics (Road Transport - Car Driving Instruction)
                    [IsImplicit] => 
                    [ExtentId] => 01
                    [Extent] => Deliver and assess
                    [DeliveryNsw] => 1
                    [DeliveryVic] => 1
                    [DeliveryQld] => 1
                    [DeliverySa] => 1
                    [DeliveryWa] => 1
                    [DeliveryTas] => 1
                    [DeliveryNt] => 1
                    [DeliveryAct] => 1
                    [ScopeDecisionType] => 0
                    [ScopeDecision] => Deliver and assess
                )
    
        )
    
    [total] => 1
    )
    
    评论

报告相同问题?

悬赏问题

  • ¥15 关于博途V17进行仿真时无法建立连接问题
  • ¥15 请问下这个红框里面是什么文档或者记事本编辑器
  • ¥15 机器学习教材中的例题询问
  • ¥15 求.net core 几款免费的pdf编辑器
  • ¥15 为什么安装HCL 和virtualbox之后没有找到VirtualBoxHost-OnlyNetWork?
  • ¥15 C# P/Invoke的效率问题
  • ¥20 thinkphp适配人大金仓问题
  • ¥20 Oracle替换.dbf文件后无法连接,如何解决?(相关搜索:数据库|死循环)
  • ¥15 数据库数据成问号了,前台查询正常,数据库查询是?号
  • ¥15 算法使用了tf-idf,用手肘图确定k值确定不了,第四轮廓系数又太小才有0.006088746097507285,如何解决?(相关搜索:数据处理)