dongse5408
2018-05-30 18:07
浏览 605
已采纳

PHP JSON_encode()收到“格式错误的UTF-8字符,可能编码错误”(错误)

I cannot solve this issue and I'm getting crazy.

JSON_encode() is casting the error: Malformed UTF-8 characters, possibly incorrectly encoded on few records (2 or 3) from a set of 10k records. However this seems very impossible to fix.

  • mysql is already utf8mb4 everywhere (database, table, columns and collation)
  • php is 7.2 and of course in utf8
  • apache default charset is utf8 (however the error is throw at PHP-level).

I can also print to screen correctly the record in PHP without issue in a simple HTML debug page. However If I try to encode it in JSON I get the error.

I found that these records have been imported from a CVS probably bypassing the cleaner. What is so strange is that the entire CSV file is parsed with:

$this->encoding = mb_detect_encoding($source,mb_detect_order(),true);
if ($this->encoding!="" && $this->encoding!="UTF8") {
    $source = iconv($this->encoding, "UTF-8", $source);
} 

I cannot post any full broken data due to the privacy (and GDPR). However I succeed to extract a part which seems to be the broken one:

RESIDENCE �PRINCIPE

UPDATES

I try to get the bitcode of these broken chars. This is what I found. In ASCII by using simple native function str_split and ord these char is:

'�' 160

I would like to find the bitcode also in utf8, so I find this usefull function on PHP.net http://php.net/manual/en/function.ord.php#109812 Which try to find bitcode of MultiByteStrings. and it gives me:

-2096

Which is....... negative?

图片转代码服务由CSDN问答提供 功能建议

我无法解决这个问题而且我已经疯了。</ p>

< 代码> JSON_encode()</ code>正在从一组10k记录中输出错误:格式错误的UTF-8字符,可能是错误编码的</ code>在几条记录(2或3)上。 但是这似乎很 不可能修复。</ p>

  • mysql已经是utf8mb4无处不在(数据库,表,列和整理)</ li>
  • php是7.2当然是 utf8 </ li>
  • apache默认字符集是utf8(但错误是在PHP级别抛出)。</ li> </ ul>

    我也可以打印到屏幕 在一个简单的HTML调试页面中正确记录PHP中没有问题的记录。 但是,如果我尝试用JSON编码,我会收到错误。</ p>

    我发现这些记录是从CVS导入的,可能绕过了清理程序。 奇怪的是,整个CSV文件解析为:</ p>

      $ this-&gt; encoding = mb_detect_encoding($ source,mb_detect_order(),true); 
    if(  $ this-&gt; encoding!=“”&amp;&amp; $ this-&gt; encoding!=“UTF8”){
     $ source = iconv($ this-&gt; encoding,“UTF-8”,$ source)  ; 
    } 
     </ code> </ pre> 
     
     

    由于隐私(和GDPR),我无法发布任何完整的损坏数据。 但是,我成功提取了一个似乎是 打破一个:</ p>

     RESIDENCE�PRINCIPE
     </ code> </ pre> 
     
     

    更新</ strong> </ p> \ n

    我试图获得这些破碎的字符的bitcode。 这就是我发现的。 在ASCII中,使用简单的本机函数 str_split </ code>和 ord </ code>,这些char是:</ p>

     '  �'160 
     </ code> </ pre> 
     
     

    我想在utf8中找到bitcode,所以我在PHP.net上找到这个有用的函数 http://php.net/manual/en/function.ord.php#109812 哪怕尝试 找到MultiByteStrings的bitcode。 它给了我:</ p>

      -2096 
     </ code> </ pre> 
     
     

    哪个是.......否定?< / p> </ div>

2条回答 默认 最新

相关推荐 更多相似问题