PHP JSON_encode（）收到“格式错误的UTF-8字符，可能编码错误”（错误）

I cannot solve this issue and I'm getting crazy.

JSON_encode() is casting the error: Malformed UTF-8 characters, possibly incorrectly encoded on few records (2 or 3) from a set of 10k records. However this seems very impossible to fix.

mysql is already utf8mb4 everywhere (database, table, columns and collation)
php is 7.2 and of course in utf8
apache default charset is utf8 (however the error is throw at PHP-level).

I can also print to screen correctly the record in PHP without issue in a simple HTML debug page. However If I try to encode it in JSON I get the error.

I found that these records have been imported from a CVS probably bypassing the cleaner. What is so strange is that the entire CSV file is parsed with:

$this->encoding = mb_detect_encoding($source,mb_detect_order(),true);
if ($this->encoding!="" && $this->encoding!="UTF8") {
    $source = iconv($this->encoding, "UTF-8", $source);
}

I cannot post any full broken data due to the privacy (and GDPR). However I succeed to extract a part which seems to be the broken one:

RESIDENCE �PRINCIPE

UPDATES

I try to get the bitcode of these broken chars. This is what I found. In ASCII by using simple native function str_split and ord these char is:

'�' 160

I would like to find the bitcode also in utf8, so I find this usefull function on PHP.net http://php.net/manual/en/function.ord.php#109812 Which try to find bitcode of MultiByteStrings. and it gives me:

-2096

Which is....... negative?

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

dsqe46004 2018-05-31 09:32

关注

SOLVED!

The issue was in the function mb_detect_order(), this function just don't work as I was expecting. I was thinking this was a list of full supporting encoding order by mostly used in order to speed up the detection process.

But I just found that this function return just 2 encoding:

//print_r(mb_detect_order());
Array
(
    [0] => ASCII
    [1] => UTF-8
)

Which is almost completly useless in my case. MB functions can detect much more charset. You can check them out by run mb_list_encodings() and get the full list:

//print_r(mb_list_encodings());
Array
(
    [0] => pass
    [1] => auto
    [2] => wchar
    [3] => byte2be
    [4] => byte2le
    [5] => byte4be
    [6] => byte4le
    [7] => BASE64
    [8] => UUENCODE
    [9] => HTML-ENTITIES
    [10] => Quoted-Printable
    [11] => 7bit
    [12] => 8bit
    [13] => UCS-4
    [14] => UCS-4BE
    [15] => UCS-4LE
    [16] => UCS-2
    [17] => UCS-2BE
    [18] => UCS-2LE
    [19] => UTF-32
    [20] => UTF-32BE
    [21] => UTF-32LE
    [22] => UTF-16
    [23] => UTF-16BE
    [24] => UTF-16LE
    [25] => UTF-8
    [26] => UTF-7
    [27] => UTF7-IMAP
    [28] => ASCII
    [29] => EUC-JP
    [30] => SJIS
    [31] => eucJP-win
    [32] => EUC-JP-2004
    [33] => SJIS-win
    [34] => SJIS-Mobile#DOCOMO
    [35] => SJIS-Mobile#KDDI
    [36] => SJIS-Mobile#SOFTBANK
    [37] => SJIS-mac
    [38] => SJIS-2004
    [39] => UTF-8-Mobile#DOCOMO
    [40] => UTF-8-Mobile#KDDI-A
    [41] => UTF-8-Mobile#KDDI-B
    [42] => UTF-8-Mobile#SOFTBANK
    [43] => CP932
    [44] => CP51932
    [45] => JIS
    [46] => ISO-2022-JP
    [47] => ISO-2022-JP-MS
    [48] => GB18030
    [49] => Windows-1252
    [50] => Windows-1254
    [51] => ISO-8859-1
    [52] => ISO-8859-2
    [53] => ISO-8859-3
    [54] => ISO-8859-4
    [55] => ISO-8859-5
    [56] => ISO-8859-6
    [57] => ISO-8859-7
    [58] => ISO-8859-8
    [59] => ISO-8859-9
    [60] => ISO-8859-10
    [61] => ISO-8859-13
    [62] => ISO-8859-14
    [63] => ISO-8859-15
    [64] => ISO-8859-16
    [65] => EUC-CN
    [66] => CP936
    [67] => HZ
    [68] => EUC-TW
    [69] => BIG-5
    [70] => CP950
    [71] => EUC-KR
    [72] => UHC
    [73] => ISO-2022-KR
    [74] => Windows-1251
    [75] => CP866
    [76] => KOI8-R
    [77] => KOI8-U
    [78] => ArmSCII-8
    [79] => CP850
    [80] => JIS-ms
    [81] => ISO-2022-JP-2004
    [82] => ISO-2022-JP-MOBILE#KDDI
    [83] => CP50220
    [84] => CP50220raw
    [85] => CP50221
    [86] => CP50222
)

I was in wrong, thinking that mb_detect_order was just an ordered version of this list. The mb_detect_order is just.... useless. In order to encode in UTF8 in the right way use the following code:

$my_encoding_list = [
    "UTF-8",
    "UTF-7",
    "UTF-16",
    "UTF-32",
    "ISO-8859-16",
    "ISO-8859-15",
    "ISO-8859-10",
    "ISO-8859-1",
    "Windows-1254",
    "Windows-1252",
    "Windows-1251",
    "ASCII",
    //add yours preferred
];

//remove unsupported encodings
$encoding_list = array_intersect($my_encoding_list, mb_list_encodings());

//detect 'finally' the encoding
$this->encoding = mb_detect_encoding($source,$encoding_list,true);

This worked and solved my issue with bad data saved in the database.

本回答被题主选为最佳回答 , 对您是否有帮助呢?

查看更多回答(1条)

报告相同问题？

关注问题

PHP JSON_encode（）收到“格式错误的UTF-8字符，可能编码错误”（错误） php
2018-05-30 18:07

回答 2 已采纳 SOLVED! The issue was in the function mb_detect_order(), this function just don't work as I was e
php-base64编码的数据在json_encode之后丢失 json php
2017-12-19 19:54

回答 1 已采纳 As @LSerni said, I was implicitly assigning the encrypted id which is a string to the integer prop
json_encode（）UTF-8错误 mysql php
2012-05-17 11:57

回答 3 已采纳 Your output is correct; that's how you're supposed to embed unicode characters in JSON.
PHP 出现 json_encode error: Malformed UTF-8 characters, possibly incorrectly encoded 问题解决方法
2021-12-09 15:23

韩十二．的博客接口请求出现 json_encode error: ...json_encode error: Malformed UTF-8 characters, possibly incorrectly encoded 畸形的UTF-8字符，可能编码不正确排查问题的过程中,确定问题为: ## 以数组的形式读取字符串
php exif_read_data（）不在utf-8中 html php
2018-08-18 19:58

回答 1 已采纳 After literally hours of looking for answer and not finding anything (nothing worked for me) I fig
俄语字符的json_encode错误 json php
2016-11-25 07:16

回答 1 已采纳 You need to set UTF8 before retrieving results from mysql. Just before you retrieve results from
PHP UTF-8 mb_convert_encode和Internet-Explorer php
2015-07-15 12:56

回答 2 已采纳 Although I prefer using urlencoded strings in address bar but for your case you can try to encode
php使用json_encode对变量json编码
2020-12-19 16:18

为了解决这个问题，可以先将字符串转为UTF-8编码，或者在`json_encode()`时设置`JSON_FORCE_OBJECT`标志，使得非UTF-8字符串也能被正确处理。此外，`json_encode()`还有一些可选参数，如`JSON_PRETTY_PRINT`用于...
有没有办法返回PHP`json_encode`编码UTF-8而不是Unicode？ json php
2011-07-21 06:08

回答 5 已采纳 {"a":"\u00e1"} and {"a":"á"} are different ways to write the same JSON document; The JSON decoder
json_decode返回NULL，UTF-8 BOM json php
2012-11-27 11:12

回答 2 已采纳 Your decryption has apparently left a bunch of padding NUL bytes at the end of the string. Either
清理错误的UTF-8字符串
2019-09-19 18:59

回答 3 已采纳 You could improve your "sanitiser" by dropping invalid runes: package main import ( "fmt"
php json_encode与json_decode详解及实例
2020-12-18 23:15

3. 字符编码：`json_encode`要求输入的数据必须是UTF-8编码，非UTF-8编码的数据可能会导致错误或空值。在处理中文或其他编码时，确保数据正确转换为UTF-8。 4. 数组类型：PHP中的索引数组和关联数组在转换时会有所...
不能用php将json_encode插入mysql mysql php sql
2018-04-02 07:55

回答 2 已采纳 I have checked its working fine with me. Please see below code: <?php $requests = '{"Monday":"
php json_encode后乱码,PHP中json_encode后中文乱码的解决方案
2021-05-08 02:38

小七家的傻子的博客 header("Content-Type:...charset=utf-8;");$arr = array ('Version_code'=>2,'Version_name'=>'UpdateVersion','Versoin_desc'=>'更新了地图功能','Versoin_path'=>'http://nnddkj.com/BusIot/APK/BusI...
PHP JSON_ENCODE 不转义中文汉字的方法.rar
2021-09-16 22:03

在使用`json_encode`时，有可能遇到编码问题，如PHP的默认字符集不是UTF-8，这可能导致中文汉字无法正确编码。因此，在使用`json_encode`前，应确保PHP脚本的编码设置正确，可以使用`mb_internal_encoding`函数来...
php json_decode中文换行,PHP中json_encode与json_decode出现换行回车中文为空错误的解决方法...
2021-04-29 02:13

weixin_39887386的博客 1、json_encode与json_decode的用法json_encode — 对变量进行 JSON 编码json_encode() 例子$arr=array('aa'=>1,'bb'=>2,'cc'=>3,'dd'=>4,'ee'=>5);echojson_encode($arr);?>以上例程会输出：{"aa...
php中json_encode处理gbk与gb2312中文乱码问题的解决方法
2020-10-25 16:15

1. **返回`null`**：如果`json_encode`遇到非UTF-8编码的中文字符，它会将其视为无效数据，因此在输出的JSON字符串中，对应的键值会被设为`null`。例如： ```php $arr = array( 'catid' => '4', 'catname' => '...
php 中文参数编码格式,PHP——json_encode中文编码问题
2021-03-23 11:50

Yu-Demon321mkq的博客在PHP项目中会经常遇到中文乱码，这是一个比较...也就是说，如果我们第一条设置的charset为UTF-8，那我们的文件编码也要设置成UTF-8。二者保持一致即可。当然上面是针对于网页的情况，但是现在随着Ajax在web应用中...
没有解决我的问题, 去提问

悬赏问题

¥15 metadata提取的PDF元数据，如何转换为一个Excel
¥15 关于arduino编程toCharArray()函数的使用
¥100 vc++混合CEF采用CLR方式编译报错
¥15 coze 的插件输入飞书多维表格 app_token 后一直显示错误，如何解决？
¥15 vite+vue3+plyr播放本地public文件夹下视频无法加载
¥15 c#逐行读取txt文本，但是每一行里面数据之间空格数量不同
¥50 如何openEuler 22.03上安装配置drbd
¥20 ING91680C BLE5.3 芯片怎么实现串口收发数据
¥15 无线连接树莓派，无法执行update，如何解决？（相关搜索：软件下载）
¥15 Windows11, backspace, enter, space键失灵

码龄粉丝数原力等级 --

PHP JSON_encode（）收到“格式错误的UTF-8字符，可能编码错误”（错误）

2条回答默认最新

码龄粉丝数原力等级 --

悬赏问题

PHP JSON_encode（）收到“格式错误的UTF-8字符，可能编码错误”（错误）

2条回答 默认 最新

悬赏问题

2条回答默认最新