PHP JSON_encode（）收到“格式错误的UTF-8字符，可能编码错误”（错误）

I cannot solve this issue and I'm getting crazy.

JSON_encode() is casting the error: Malformed UTF-8 characters, possibly incorrectly encoded on few records (2 or 3) from a set of 10k records. However this seems very impossible to fix.

mysql is already utf8mb4 everywhere (database, table, columns and collation)
php is 7.2 and of course in utf8
apache default charset is utf8 (however the error is throw at PHP-level).

I can also print to screen correctly the record in PHP without issue in a simple HTML debug page. However If I try to encode it in JSON I get the error.

I found that these records have been imported from a CVS probably bypassing the cleaner. What is so strange is that the entire CSV file is parsed with:

$this->encoding = mb_detect_encoding($source,mb_detect_order(),true);
if ($this->encoding!="" && $this->encoding!="UTF8") {
    $source = iconv($this->encoding, "UTF-8", $source);
}

I cannot post any full broken data due to the privacy (and GDPR). However I succeed to extract a part which seems to be the broken one:

RESIDENCE �PRINCIPE

UPDATES

I try to get the bitcode of these broken chars. This is what I found. In ASCII by using simple native function str_split and ord these char is:

'�' 160

I would like to find the bitcode also in utf8, so I find this usefull function on PHP.net http://php.net/manual/en/function.ord.php#109812 Which try to find bitcode of MultiByteStrings. and it gives me:

-2096

Which is....... negative?

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

dsqe46004 2018-05-31 09:32

关注

SOLVED!

The issue was in the function mb_detect_order(), this function just don't work as I was expecting. I was thinking this was a list of full supporting encoding order by mostly used in order to speed up the detection process.

But I just found that this function return just 2 encoding:

//print_r(mb_detect_order());
Array
(
    [0] => ASCII
    [1] => UTF-8
)

Which is almost completly useless in my case. MB functions can detect much more charset. You can check them out by run mb_list_encodings() and get the full list:

//print_r(mb_list_encodings());
Array
(
    [0] => pass
    [1] => auto
    [2] => wchar
    [3] => byte2be
    [4] => byte2le
    [5] => byte4be
    [6] => byte4le
    [7] => BASE64
    [8] => UUENCODE
    [9] => HTML-ENTITIES
    [10] => Quoted-Printable
    [11] => 7bit
    [12] => 8bit
    [13] => UCS-4
    [14] => UCS-4BE
    [15] => UCS-4LE
    [16] => UCS-2
    [17] => UCS-2BE
    [18] => UCS-2LE
    [19] => UTF-32
    [20] => UTF-32BE
    [21] => UTF-32LE
    [22] => UTF-16
    [23] => UTF-16BE
    [24] => UTF-16LE
    [25] => UTF-8
    [26] => UTF-7
    [27] => UTF7-IMAP
    [28] => ASCII
    [29] => EUC-JP
    [30] => SJIS
    [31] => eucJP-win
    [32] => EUC-JP-2004
    [33] => SJIS-win
    [34] => SJIS-Mobile#DOCOMO
    [35] => SJIS-Mobile#KDDI
    [36] => SJIS-Mobile#SOFTBANK
    [37] => SJIS-mac
    [38] => SJIS-2004
    [39] => UTF-8-Mobile#DOCOMO
    [40] => UTF-8-Mobile#KDDI-A
    [41] => UTF-8-Mobile#KDDI-B
    [42] => UTF-8-Mobile#SOFTBANK
    [43] => CP932
    [44] => CP51932
    [45] => JIS
    [46] => ISO-2022-JP
    [47] => ISO-2022-JP-MS
    [48] => GB18030
    [49] => Windows-1252
    [50] => Windows-1254
    [51] => ISO-8859-1
    [52] => ISO-8859-2
    [53] => ISO-8859-3
    [54] => ISO-8859-4
    [55] => ISO-8859-5
    [56] => ISO-8859-6
    [57] => ISO-8859-7
    [58] => ISO-8859-8
    [59] => ISO-8859-9
    [60] => ISO-8859-10
    [61] => ISO-8859-13
    [62] => ISO-8859-14
    [63] => ISO-8859-15
    [64] => ISO-8859-16
    [65] => EUC-CN
    [66] => CP936
    [67] => HZ
    [68] => EUC-TW
    [69] => BIG-5
    [70] => CP950
    [71] => EUC-KR
    [72] => UHC
    [73] => ISO-2022-KR
    [74] => Windows-1251
    [75] => CP866
    [76] => KOI8-R
    [77] => KOI8-U
    [78] => ArmSCII-8
    [79] => CP850
    [80] => JIS-ms
    [81] => ISO-2022-JP-2004
    [82] => ISO-2022-JP-MOBILE#KDDI
    [83] => CP50220
    [84] => CP50220raw
    [85] => CP50221
    [86] => CP50222
)

I was in wrong, thinking that mb_detect_order was just an ordered version of this list. The mb_detect_order is just.... useless. In order to encode in UTF8 in the right way use the following code:

$my_encoding_list = [
    "UTF-8",
    "UTF-7",
    "UTF-16",
    "UTF-32",
    "ISO-8859-16",
    "ISO-8859-15",
    "ISO-8859-10",
    "ISO-8859-1",
    "Windows-1254",
    "Windows-1252",
    "Windows-1251",
    "ASCII",
    //add yours preferred
];

//remove unsupported encodings
$encoding_list = array_intersect($my_encoding_list, mb_list_encodings());

//detect 'finally' the encoding
$this->encoding = mb_detect_encoding($source,$encoding_list,true);

This worked and solved my issue with bad data saved in the database.

本回答被题主选为最佳回答 , 对您是否有帮助呢?

查看更多回答(1条)

报告相同问题？

关注问题

PHP JSON_encode（）收到“格式错误的UTF-8字符，可能编码错误”（错误） php
2018-05-30 18:07

回答 2 已采纳 SOLVED! The issue was in the function mb_detect_order(), this function just don't work as I was e
php-base64编码的数据在json_encode之后丢失 json php
2017-12-19 19:54

回答 1 已采纳 As @LSerni said, I was implicitly assigning the encrypted id which is a string to the integer prop
json_encode（）UTF-8错误 mysql php
2012-05-17 11:57

回答 3 已采纳 Your output is correct; that's how you're supposed to embed unicode characters in JSON.
PHP 出现 json_encode error: Malformed UTF-8 characters, possibly incorrectly encoded 问题解决方法
2021-12-09 15:23

韩十二．的博客接口请求出现 json_encode error: ...json_encode error: Malformed UTF-8 characters, possibly incorrectly encoded 畸形的UTF-8字符，可能编码不正确排查问题的过程中,确定问题为: ## 以数组的形式读取字符串
php exif_read_data（）不在utf-8中 html php
2018-08-18 19:58

回答 1 已采纳 After literally hours of looking for answer and not finding anything (nothing worked for me) I fig
俄语字符的json_encode错误 json php
2016-11-25 07:16

回答 1 已采纳 You need to set UTF8 before retrieving results from mysql. Just before you retrieve results from
PHP UTF-8 mb_convert_encode和Internet-Explorer php
2015-07-15 12:56

回答 2 已采纳 Although I prefer using urlencoded strings in address bar but for your case you can try to encode
php json_encode后乱码,PHP中json_encode后中文乱码的解决方案
2021-05-08 02:38

小七家的傻子的博客 header("Content-Type:...charset=utf-8;");$arr = array ('Version_code'=>2,'Version_name'=>'UpdateVersion','Versoin_desc'=>'更新了地图功能','Versoin_path'=>'http://nnddkj.com/BusIot/APK/BusI...
有没有办法返回PHP`json_encode`编码UTF-8而不是Unicode？ json php
2011-07-21 06:08

回答 5 已采纳 {"a":"\u00e1"} and {"a":"á"} are different ways to write the same JSON document; The JSON decoder
json_decode返回NULL，UTF-8 BOM json php
2012-11-27 11:12

回答 2 已采纳 Your decryption has apparently left a bunch of padding NUL bytes at the end of the string. Either
清理错误的UTF-8字符串
2019-09-19 18:59

回答 3 已采纳 You could improve your "sanitiser" by dropping invalid runes: package main import ( "fmt"
php 中文参数编码格式,PHP——json_encode中文编码问题
2021-03-23 11:50

Yu-Demon321mkq的博客在PHP项目中会经常遇到中文乱码，这是一个比较...也就是说，如果我们第一条设置的charset为UTF-8，那我们的文件编码也要设置成UTF-8。二者保持一致即可。当然上面是针对于网页的情况，但是现在随着Ajax在web应用中...
不能用php将json_encode插入mysql mysql php sql
2018-04-02 07:55

回答 2 已采纳 I have checked its working fine with me. Please see below code: <?php $requests = '{"Monday":"
php json_decode中文换行,PHP中json_encode与json_decode出现换行回车中文为空错误的解决方法...
2021-04-29 02:13

weixin_39887386的博客 1、json_encode与json_decode的用法json_encode — 对变量进行 JSON 编码json_encode() 例子$arr=array('aa'=>1,'bb'=>2,'cc'=>3,'dd'=>4,'ee'=>5);echojson_encode($arr);?>以上例程会输出：{"aa...
php中json_encode和json_decode的错误处理
2018-09-11 13:52

涛的博客的博客在php中，json_encode和json_decode是很常用的函数，具体用法可以查看相关文档，这里主要说一下错误处理。平时我们在使用这两个方法的时候可能没怎么注意错误处理，有时候如果传入的参数格式不正确就会导致报错了...
php自定义json字符串,php自定义json_encode()和json_decode()函数
2021-04-12 20:47

兔希求职咖青的博客 json数据大家应该遇到过，json_encode()和json_decode()是php5.0以后加上的内置函数，如果低版本要使用，需加扩展，很多时候我们无权改变服务器的配置，我们只能通过自定义函数来实现这两个函数，其实所有的系统内置...
php json emoji问号,json_encode转码emoji等特殊表情报错
2021-04-19 01:11

三思叶的博客 json_encode转码emoji等特殊表情报错进行实验一:json_encode($data,JSON_UNESCAPED_UNICODE);$emoji="????";$data=['post'=>"很好，爷的网抑云歌单新增12首VIP单曲".$emoji,];//用这个就会有问题!echo "\n";echo ...
php u0026,PHP如何让json_encode不转义中文？
2021-03-23 19:47

weixin_39980903的博客 PHP让json_encode不转义中文的方法：在使用“json_encode()”函数进行JSON编码时，在第2个参数传入常量“JSON_UNESCAPED_UNICODE”，其意义是以字面编码多字节Unicode字符。代码示例$jsonStr = json_encode($data,...
没有解决我的问题, 去提问

悬赏问题

¥15 如何在scanpy上做差异基因和通路富集？
¥20 关于#硬件工程#的问题，请各位专家解答！
¥15 关于#matlab#的问题：期望的系统闭环传递函数为G(s)=wn^2/s^2+2¢wn+wn^2阻尼系数¢=0.707，使系统具有较小的超调量
¥15 FLUENT如何实现在堆积颗粒的上表面加载高斯热源
¥30 截图中的mathematics程序转换成matlab
¥15 动力学代码报错，维度不匹配
¥15 Power query添加列问题
¥50 Kubernetes&Fission&Eleasticsearch
¥15 報錯：Person is not mapped，如何解決？
¥15 c++头文件不能识别CDialog

码龄粉丝数原力等级 --

PHP JSON_encode（）收到“格式错误的UTF-8字符，可能编码错误”（错误）

2条回答默认最新

码龄粉丝数原力等级 --

悬赏问题

PHP JSON_encode（）收到“格式错误的UTF-8字符，可能编码错误”（错误）

2条回答 默认 最新

悬赏问题

2条回答默认最新