字符编码utf8到latin1，解释这两个字符

I have a database which uses latin-1 and a PHP application which is utf-8.

I have strings in the database like this:

'SociÃ©tÃ©' which should be Société

'â‚¬1bn' which should be €2bn.

When I print the faulty characters to screen with PHP's ord(), from the returning data in the db, it prints 195 and 226.

Could somebody explain why this is happening (why saving like this and why characters being read as they are) and if I can reverse it.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dpjuppr1361 2013-01-28 14:06
关注
The WHY:

1) é is unicode 233 (as the browser reads it).
é utf8 bytes converted into latin1 chars bytes is Ã ©. This is why it appears like this in the database.
Ã © is recognised as Ã which is code point 195. Hence why you see that.

2) € is unicode 8364.
€ utf8 bytes converted into latin1 chars bytes is â <82> ¬. Again this is why they appear like this in the db.
â <82> ¬ is recognised as â which is code point 226. Again this is why you see this.

That is why you see those values from ord() and why the characters are stored in that manner in a latin-1 database.

Reverse:

To reverse it we need Latin-1 char bytes to UTF8 bytes.

If we try it:
â is 226. Converted latin-1 to utf8 produces Ã¢.
Ã is 195. Converted latin-1 to utf8 produces Ãƒ.

Problem:

The problem is Latin-1 has less characters than utf-8 (by a long way).
Latin1 single-byte stream and UTF8 multi-byte char stream so 1 char in utf8 could produce up to 4 chars for latin1.
So the UTF-8 to Latin-1 conversion produces faulty characters.
Latin1 back to utf8 is not possible.

Solution:

IF you are unable to change the character set of your database I could suggest encoding special characters in the database in their character entity before writing them (so the db can stay as latin1 and app as utf8 as both can understand html entities) e.g. umlaut as Ä.
It could be done using PHPs html_entity_decode() combined with mb_detect_encoding() to detect and convert specific characters.

References:

See ltf.ed.ac.uk for the utf8 char bytes to latin1 bytes:
http://www.ltg.ed.ac.uk/~richard/utf-8.cgi?input=%C3%96&mode=char

本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

报告相同问题？

关注问题

字符编码utf8到latin1，解释这两个字符 php sql
2013-01-28 14:03

回答 2 已采纳 The WHY: 1) é is unicode 233 (as the browser reads it). é utf8 bytes converted into latin1 chars
如何在PHP中使用COM对象获取UTF-8字符串？ php
2017-11-06 12:40

回答 1 已采纳 Well, answer was right in front of my eyes, I just overlooked: COM::__construct ( string $module_
清理错误的UTF-8字符串
2019-09-19 18:59

回答 3 已采纳 You could improve your "sanitiser" by dropping invalid runes: package main import ( "fmt"
python-latin1-to-utf8:将错误的 Latin-1 字符转换为 UTF-8 字符
2021-07-18 21:50

python-latin1-to-utf8 将错误的 Latin-1 字符转换为 UTF-8 字符。是对编码问题的全面描述和解释。用法 $ python latin1-to-utf8.py AutomÃƒÂ¡tica > Automática 相关项目
PHP：使用utf8_encode时在csv中错误编码的字符 mysql php
2016-06-23 14:31

回答 2 已采纳 The var_dump shows that the string is already encoded in UTF-8. Using utf8_encode on it will garbl
Utf-8字符显示为ISO-8859-1 mysql php
2010-07-22 16:48

回答 2 已采纳 Well, I've found that SET NAMES isn't really all that great. Take a peak at the docs... What I t
C# 操作Mysql 设置编码为 latin1 遇到的问题 c# visual studio
2020-12-26 12:08

回答 14 已采纳 q386847535
JavaScript 字符编码规则
2021-01-19 17:05

[escape] ISO Latin字符集对指定的字符串进行编码，不会被编码的字符[ @ * / +] [encodeURI] UTF-8字符集对指定的字符串进行编码 [encodeURIComponent] UTF-8字符集对指定的字符串进行编码，[支持更多的字符] 当源与...
将UTF-8欧元字符转换为其他欧元 php
2014-06-20 13:59

回答 1 已采纳 Looks like it does work if I convert from utf8_decode back to Windows-1252 and convert to utf8 aga
怎样修改数据库的默认编码，想把它改成utf8的？ mysql 数据库
2015-12-06 13:49

回答 2 已采纳已解决，新建一个my,ini文件，里面加入：[mysqld] character-set-server=utf8 [mysql] default-character-set=utf8
从Percent Encoded URL参数中获取正确的UTF-8字符 php
2012-08-26 10:18

回答 2 已采纳 Make sure not to have any function which alters your $_REQUEST. Some functions are not aware of sp
mysql8默认字符编码_修改mysql默认字符编码为utf8
2021-02-07 23:37

万俟灵儿的博客 MySQL的默认编码是Latin1，不支持中文，要支持中文需要把数据库的默认编码修改为gbk或者utf8。1、修改数据库字符编码mysql> alter database mydb character set utf8 ;2、创建数据库时，指定数据库的字符编码...
PHP中的JPEG IPTC数据无法正确显示UTF-8字符 php
2012-12-09 06:14

回答 2 已采纳 Answering a bit late, but since I had the same problem displaying special characters as č š ž (whi
mysql数据库latin1转utf8_三种转换Mysql数据库数据编码的窍门-latin1转utf8
2021-01-27 11:04

weixin_39603505的博客背景：某个操作系统的Mysql数据库数据库Databnsednname采用默认的latin1字符集，操作系统升级需求将所有数据转换成utf-8各式，目的数据库Databnse为newdbname(建库时应用utf8)方法一：步骤一命令行执行：Mysql...
mysql编码修改utf8_修改数据库mysql字符编码为UTF8
2021-01-18 23:18

牛浩帆的博客 MySQL会出现中文乱码的原因不外乎下列几点：1.server本身设定问题，例如还停留在latin12.table的语系设定问题(包含character与collation)3.客户端程式(例如php)的连线语系设定问题强烈建议使用utf8!!!!utf8可以兼容...
没有解决我的问题, 去提问

悬赏问题

¥50 如何用脚本实现输入法的热键设置
¥20 我想使用一些网络协议或者部分协议也行，主要想实现类似于traceroute的一定步长内的路由拓扑功能
¥30 深度学习，前后端连接
¥15 孟德尔随机化结果不一致
¥15 apm2.8飞控罗盘bad health，加速度计校准失败
¥15 求解O-S方程的特征值问题给出边界层布拉休斯平行流的中性曲线
¥15 谁有desed数据集呀
¥20 手写数字识别运行c仿真时，程序报错错误代码sim211-100
¥15 关于#hadoop#的问题
¥15 (标签-Python|关键词-socket)

字符编码utf8到latin1，解释这两个字符

2条回答 默认 最新

悬赏问题

2条回答默认最新