I cannot solve this issue and I'm getting crazy.
JSON_encode()
is casting the error: Malformed UTF-8 characters, possibly incorrectly encoded
on few records (2 or 3) from a set of 10k records.
However this seems very impossible to fix.
- mysql is already utf8mb4 everywhere (database, table, columns and collation)
- php is 7.2 and of course in utf8
- apache default charset is utf8 (however the error is throw at PHP-level).
I can also print to screen correctly the record in PHP without issue in a simple HTML debug page. However If I try to encode it in JSON I get the error.
I found that these records have been imported from a CVS probably bypassing the cleaner. What is so strange is that the entire CSV file is parsed with:
$this->encoding = mb_detect_encoding($source,mb_detect_order(),true);
if ($this->encoding!="" && $this->encoding!="UTF8") {
$source = iconv($this->encoding, "UTF-8", $source);
}
I cannot post any full broken data due to the privacy (and GDPR). However I succeed to extract a part which seems to be the broken one:
RESIDENCE �PRINCIPE
UPDATES
I try to get the bitcode of these broken chars. This is what I found.
In ASCII by using simple native function str_split
and ord
these char is:
'�' 160
I would like to find the bitcode also in utf8, so I find this usefull function on PHP.net http://php.net/manual/en/function.ord.php#109812 Which try to find bitcode of MultiByteStrings. and it gives me:
-2096
Which is....... negative?