dsxz84851 2010-12-26 21:08 采纳率: 100%
浏览 133

字符串编码不正确

Note: I have read all of the related PHP, UTF-8, character encoding articles that are usually suggested, but my question relates to data inserted before I applied such techniques. I am wishing to retrospectively fix all character encoding problems.

Now all connections are set as utf8 using PDO.

PDO::MYSQL_ATTR_INIT_COMMAND => 'SET NAMES utf8'

Unfortunately, a large amount of data was inserted that is of questionable encoding before I had implemented correct character encoding practices. As displayed by:

$sql = "SELECT name FROM data LIMIT 3";

foreach ($pdo->query($sql) as $row)
{
    $name = $row['name'];

    echo $name . "
";
    echo utf8_encode($name) . "
";
    echo utf8_decode($name) . "
";
    echo htmlspecialchars($name, ENT_QUOTES, 'UTF-8') . "
";
    echo htmlspecialchars(utf8_encode($name), ENT_QUOTES, 'UTF-8') . "
";
    echo htmlspecialchars(utf8_decode($name), ENT_QUOTES, 'UTF-8') . "
";
    echo '<hr/>';
}

Which produces:

Antonín Dvořák
AntonÃÆín DvoÃâ¦Ãâ¢ÃÆák
Anton�?­n Dvo�?�?�?¡k
Antonín Dvořák
AntonÃÆín DvoÃâ¦Ãâ¢ÃÆák

----------
Ô±Ö€Õ¡Õ´ Ô½Õ¡Õ¹Õ¡Õ¿Ö€ÕµÕ¡Õ¶
ñÃâ¬Ã¡Ã´ ýáùáÿÃâ¬ÃµÃ¡Ã¶
Ա�?ամ Խաչատ�?յան
Ô±Ö€Õ¡Õ´ Ô½Õ¡Õ¹Õ¡Õ¿Ö€ÕµÕ¡Õ¶
ñÃâ¬Ã¡Ã´ ýáùáÿÃâ¬ÃµÃ¡Ã¶

----------
Tiësto
Tiësto
Tiësto
Tiësto
Tiësto
Tiësto
----------

When removing 'SET NAMES utf8' with PDO it produces the data, which does actually have the correct items, albeit on different lines:

Antonín DvoÅák
Antonín DvoÃÂák
Antonín Dvořák
Antonín DvoÅák
Antonín DvoÃÂák
Antonín Dvořák
----------
Արամ Խաչատրյան
Ô±ÖÕ¡Õ´ Ô½Õ¡Õ¹Õ¡Õ¿ÖÕµÕ¡Õ¶
???? ?????????
Արամ Խաչատրյան
Ô±ÖÕ¡Õ´ Ô½Õ¡Õ¹Õ¡Õ¿ÖÕµÕ¡Õ¶
???? ?????????
----------
Tiësto
Tiësto
Ti�sto
Tiësto
Tiësto

----------

And here is a dump of the database rows concerned:

DROP TABLE IF EXISTS `data`;
CREATE TABLE IF NOT EXISTS `data` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(80) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `name` (`name`(10)),
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 AUTO_INCREMENT=0;

INSERT INTO `data` (`id`, `name`) VALUES (0, 'Antonín Dvořák'), (1, 'Արամ Խաչատրյան'), (2, 'Tiësto');

The 3rd and 6th lines of the 3rd row "Tiësto" are then correctly echoed. I'm just unsure what is the best way to correct encodings/detect the encodings of bad strings and correct, etc.

  • 写回答

2条回答 默认 最新

  • duanqianmou4661 2010-12-26 21:29
    关注

    One way that should work - I haven't tried this myself - is to dump the database into a file using phpMyAdmin, importing it, and specifying latin1 as the encoding even though it is UTF-8 encoded. (You need the phpMyAdmin version that offers specifying the character set of the dump file in a drop down menu when importing).

    This should turn ë back into ë. If the data is consistently broken (i.e. it's not a mix of valid UTF-8 characters and broken ones), this may work.

    Obviously, make backups before trying this, and look through the data with a fine comb afterwards.

    评论

报告相同问题?

悬赏问题

  • ¥15 基于卷积神经网络的声纹识别
  • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
  • ¥100 为什么这个恒流源电路不能恒流?
  • ¥15 有偿求跨组件数据流路径图
  • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值
  • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
  • ¥15 CSAPPattacklab
  • ¥15 一直显示正在等待HID—ISP
  • ¥15 Python turtle 画图
  • ¥15 stm32开发clion时遇到的编译问题