douji8347 2012-09-20 09:35
浏览 361
已采纳

cURL使用utf-8 BOM获得响应

In my script I send data with cURL, and enabled CURLOPT_RETURNTRANSFER. The response is json encoded data. When I'm trying to json_decode, it returns null. Then I found that response contains utf-8 BOM symbols at the beginning of string ().

There is some experiments:


$data = $data = curl_exec($ch);
echo $data;

the result is {"field_1":"text_1","field_2":"text_2","field_3":"text_3"}

$data = $data = curl_exec($ch);
echo mb_detect_encoding($data);

result - UTF-8

$data = $data = curl_exec($ch);
echo mb_convert_encoding($data, 'UTF-8', mb_detect_encoding($data));
// identical to echo mb_convert_encoding($data, 'UTF-8', 'UTF-8');

result - {"field_1":"text_1","field_2":"text_2","field_3":"text_3"}


The one thing that helps is removing first 3 symbols:

if (substr($data, 0, 3) == pack('CCC', 239, 187, 191)) {
    $data = substr($data, 3);
}

But what if there will be another BOM? So the question is: How to detect right encoding of cURL response? OR how to detect what BOM has arrrived? Or maybe how to convert the response with BOM?

  • 写回答

3条回答 默认 最新

  • douhong4452 2012-09-20 09:49
    关注

    I'm afraid you already found the answer by yourself - it's bad news in that there is no better answer that I know of.

    The BOM should not be there, and it's the sender's responsibility to not send it along.

    But I can reassure you, the BOM is either there or there is not, and if it is, it's those three bytes you know.

    You can have a slightly faster and handle another N BOMs with a small alteration:

    $__BOM = pack('CCC', 239, 187, 191);
    // Careful about the three ='s -- they're all needed.
    while(0 === strpos($data, $__BOM))
        $data = substr($data, 3);
    

    A third-party BOM detector wouldn't do any different. This way you're covered even if at a later time cURL began stripping unneeded BOMs.

    Possible causes

    Some JSON optimizers and filters may decide the output requires a BOM. Also, perhaps more simply, whoever wrote the script generating the JSON inadvertently included a BOM before the opening PHP tag. Apache, not caring what the BOM is, sees there is data before the opening tag, so sends it along and hides it from the PHP stream itself. This can occasionally also cause the "Cannot add headers: output already started" error.

    Content detection

    You can verify the JSON is valid UTF-8, BOM or not BOM, but need mb_string support and you must use strict mode to get some edge cases:

    if (false === mb_detect_encoding($data, 'UTF-8', true)) {
        // JSON contains invalid sequences (erroneously NOT JSON encoded)
    }
    

    I would advise against trying to correct a possible encoding error; you risk breaking your own code, and also having to maintain someone else's work.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 目前主流的音乐软件,像网易云音乐,QQ音乐他们的前端和后台部分是用的什么技术实现的?求解!
  • ¥60 pb数据库修改与连接
  • ¥15 spss统计中二分类变量和有序变量的相关性分析可以用kendall相关分析吗?
  • ¥15 拟通过pc下指令到安卓系统,如果追求响应速度,尽可能无延迟,是不是用安卓模拟器会优于实体的安卓手机?如果是,可以快多少毫秒?
  • ¥20 神经网络Sequential name=sequential, built=False
  • ¥16 Qphython 用xlrd读取excel报错
  • ¥15 单片机学习顺序问题!!
  • ¥15 ikuai客户端多拨vpn,重启总是有个别重拨不上
  • ¥20 关于#anlogic#sdram#的问题,如何解决?(关键词-performance)
  • ¥15 相敏解调 matlab