dongyi1939 2017-10-25 10:55
浏览 92
已采纳

PHP:什么时候应该将未转义的UTF-8保存到json文件中?

Is there any benefit of saving UTF-8 characters unescaped in a json file if one only access them through PHP?

Here is what I tested:

fwrite(fopen('fileA.json','w'), json_encode('аккредитовать'));  

then the content of fileA.json is given by

"\u0413\u043b\u0430\u0432\u043d\u0430\u044f"

However, when I store it with

fwrite(fopen('fileB.json','w'), json_encode('аккредитовать', JSON_UNESCAPED_UNICODE));

the content of fileB.json is given by

"аккредитовать"

To my surprise each of the following calls

echo json_decode(file_get_contents('fileA.json'));
echo json_decode(file_get_contents('fileB.json'));
echo json_decode(file_get_contents('fileA.json')), false, 512, JSON_UNESCAPED_UNICODE);
echo json_decode(file_get_contents('fileB.json')), false, 512, JSON_UNESCAPED_UNICODE);

gives the same output:

'аккредитовать'

So as a result I would conclude that I only need to save UTF-8 chars in a json file if I want to open and read the json file directly with an editor. If I only plan to show/save the content of the json file with php then I don't need save the content unescaped and I can use

fwrite(fopen('fileA.json','w'), json_encode('аккредитовать'));  
echo json_decode(file_get_contents('fileA.json'));`

Is that correct, or did I miss anything important?

  • 写回答

1条回答 默认 最新

  • dtng5978 2017-10-25 12:15
    关注

    With JSON_UNESCAPED_UNICODE the JSON is now:

    1. more human readable
    2. not ASCII-safe

    That's the only tradeoff you're making. Once you have non-ASCII characters in your JSON, you need to ensure the JSON is handled in a binary-safe manner; e.g. you cannot simply send it over a channel that expects only ASCII data, or you need to care about the specific encoding if a channel is encoding aware (e.g. storing it in a database). None of this is of any concern when simply writing the data to a file and then reading it again, as long as the reader is treating the encoding correctly (which PHP is doing here, since it doesn't care about the encoding).

    The JSON format itself doesn't care either way, "а" and "\u0413" represent the exact same character.

    It should be noted that escaped Unicode takes up more storage than UTF-8 encoded text (6-12 bytes vs. 2-4 bytes). But that hardly matters in the majority of cases.

    Note also: JSON_UNESCAPED_UNICODE is not a valid flag for json_decode; it's simply superfluous there.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 分析下图所示同步时序逻辑电路的逻辑功能。
  • ¥15 halcon联合c#遇到了问题不能解决
  • ¥15 xshell无法连接提示ssh服务器拒绝密码
  • ¥15 AT89C52单片机C语言关于串口通信的位操作
  • ¥20 需要步骤截图(标签-服务器|关键词-map)
  • ¥50 gki vendor hook
  • ¥15 灰狼算法和蚁群算法如何结合
  • ¥15 这是一个利用ESP32自带按键和LED控制的录像代码,编译过程出现问题,请解决并且指出错误,指导如何处理 ,协助完成代码并上传代码
  • ¥20 stm32f103,hal库 hal_usart_receive函数接收不到数据。
  • ¥20 求结果和代码,sas利用OPTEX程序和D-efficiency生成正交集