有关如何修复现有数据库中的unicode，语言问题的建议

I have a client who has messed up characters in his database (I inherited this project, and my guess is when users entered the text it wasn't processed or stored correctly, either via PHP or MySQL or both). For example,

Ex 1: the database field ("about") has values that look like this:

Dans la nature, face au ciel, un b%uFFFDb%uFFFD qui sourit quand on lui souffle sur le visage.

The collation on this field in MySQL is currently set to : latin1_swedish_ci

Ex 2: Another field ("description") looks like this:

VidÃƒÆ’Ã‚Â©o tournÃƒÆ’Ã‚Â©e dans le cadre

The collation on this field in MySQL is currently set to : utf_general_ci

Basically I have to fix all this. These examples are French but there are other records that may contain Japanese or Chinese (thus double-byte chars).

For entries like example 1, my plan is to change the field to utf_general_ci, and write a script to convert all the unicode codes to the characters (I'm not exactly sure how to do this latter part...ideas??).

For entries like example 2, I'm not sure what those odd characters are.

Is utf_general_ci the collation I should be using here to support all possible languages in one database table?

Other stats:

[peter@akebono A_PSG]$ php --version PHP 5.2.6 (cli) (built: May 8 2008 08:54:23) Copyright (c) 1997-2008 The PHP Group Zend Engine v2.2.0, Copyright (c) 1998-2008 Zend Technologies with Zend Debugger v5.2.14, Copyright (c) 1999-2008, by Zend Technologies

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
douhan8610 2011-02-22 04:06
关注
Have a look at this article on what approaches you could take : http://www.phpwact.org/php/i18n/charsets

I remember we had the same problem, but we used a mysql utility to change the encoding. I forget which now.

With PHP, you should be looking at iconv and the other character set encoding/decoding methods to detect the current encoding and change it to whatever standard you're going to go with.

EDIT

Also, have a look at the multi byte methods in php. Start with : http://www.php.net/manual/en/function.mb-convert-encoding.php

本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

报告相同问题？

关注问题

为什么查询mysql数据库，返回结果的中文变成了Unicode编码 intellij-idea java spring tomcat
2020-05-11 13:42

回答 3 已采纳 数据库连接有设置编码吗？就像这样：`jdbc:mysql://localhost:3306/test?useUnicode=true&characterEncoding=utf8`
Laravel替换数据库中的Unicode字符 laravel mysql php
2018-03-25 21:23

回答 1 已采纳 You can use an SQL query for that: UPDATE `table` SET `column`=REPLACE(`column`, '–', '-')
idea(properties中文转Unicode问题) intellij-idea
2017-06-08 05:50

回答 4 已采纳先设置 1. 把Default encoding for properties files 选上UTF-8 2. 再把下面的Transparent native-to-ascii conversi
大数据面试题之数据库(3)
2024-07-04 19:33

小的~~的博客 大数据面试题之数据库(3)
sqlserver存储、查询康熙字典unicode的问题 sql sqlserver 数据库
2022-09-12 13:03

回答 1 已采纳 数据库属性：已经解决，word字段排序规则问题，需要做出如下的设置
rstudio中中文显示的是16进制的unicode码，如何使其输出中文 r语言有问必答
2021-06-24 13:36

回答 1 已采纳版本过高的原因。解决：选择低版本即可，这里我选择的3.6.3版本的R语言，下载后指定R语言路径重启Rstudio即可完美解决：如有帮助，请采纳。
tomcat启动失败db2数据库无法将 Unicode 字符串转换为 Ebcdic 字符串 eclipse tomcat
2017-09-05 03:36

回答 6 已采纳原因有好几个，，最主要的还是看看计算机，的名称第二个，如果是ｗｉｎ10的话，看看，你的用户名是不是中文的ｃｍｄ里面看
2024大数据面试题汇总(完善中。。。)
2024-06-18 16:10

hitits的博客自己汇总的面试题，涉及到大数据的常用组件，将持续更新... ... 部分图片不全，后期继续完善更新记录: 2024-6-18 初版0.1.0 :hadoop，hbase，doris，hive，mysql，es 2024-6-26 1.0.0 : java，spark，redis，kafka...
beta版使用Unicode UTF-8提供全球语言支持 c++
2022-10-04 21:19

回答 1 已采纳不要勾选这个，勾了后很多程序会乱码。
有关Python3中的Unicode数据类型和string数据类型 python 有问必答
2021-05-17 12:19

回答 5 已采纳中文转Unicode编码： text.encode("unicode_escape") exp: # 中文转Unicode编码 text = "中国" res = text.encode("u
JAVA语言中输入输出流问题 java
2016-12-28 14:11

回答 3 已采纳你的一个字符串就是一个String，每个String里的每个字都是一个char(字符)，你可以通过String.charAt()函数获取字符串中的char。这个char包含了所有的英文，中文，符号和其
MySQL数据库中实现Master-Slave高可用架构 Implementing a MySQL MasterSlave HighAvailability Architecture
2023-09-14 01:11

AI天才研究院的博客在MySQL数据库中实现Master-Slave高可用架构需要考虑很多因素，例如：负载均衡、服务器故障切换、配置一致性、备份恢复等。本文将详细介绍如何在MySQL数据库中实现Master-Slave高可用架构。什么是MySQL Master-Slave...
Elasticsearch 8.16 和 JDK 23 中的语言环境变化
2024-09-29 10:53

Elastic 中国社区官方博客的博客随着 JDK 23 即将发布，语言环境信息中有一些重大变化，这将影响 Elasticsearch 以及你提取和格式化日期时间数据的方式。首先，介绍一些背景知识。每次 Java 程序需要解析或格式化使用文本字符串的日期格式（例如，...
详解：Oracle数据库介绍、字符、类型、语言
2019-07-17 16:27

墨卿风竹的博客是一个数据库管理系统，是Oracle公司的核心产品。其在数据安全性与安整性控制方面的优越性能，以及跨操作系统、跨硬件平台的数据操作能力。基于“客户端/服务　器”(Client/Server)系统结构。主要特点： 1.支持多...
Python大数据-电商产品评论情感数据分析
2022-01-19 19:01

你隔壁的小王的博客利用好这些碎片化、非结构化的数据，将有利于企业在电商平台上的持续发展，同时，对这部分数据进行分析，依据评论数据来优化现有产品也是大数据在企业经营中的实际应用。分析产品：韶音 AfterShokz Aeropex AS800...
没有解决我的问题, 去提问

悬赏问题

¥30 STM32 INMP441无法读取数据
¥100 求汇川机器人IRCB300控制器和示教器同版本升级固件文件升级包
¥15 用visualstudio2022创建vue项目后无法启动
¥15 x趋于0时tanx-sinx极限可以拆开算吗
¥500 把面具戴到人脸上，请大家贡献智慧
¥15 任意一个散点图自己下载其js脚本文件并做成独立的案例页面，不要作在线的，要离线状态。
¥15 各位帮我看看如何写代码，打出来的图形要和如下图呈现的一样，急
¥30 c#打开word开启修订并实时显示批注
¥15 如何解决ldsc的这条报错/index error
¥15 VS2022+WDK驱动开发环境

有关如何修复现有数据库中的unicode，语言问题的建议

2条回答 默认 最新

悬赏问题

2条回答默认最新