dp518158 2013-06-16 16:52
浏览 39
已采纳

MySQL的整理是否仅用于排序?

According to the official MySQL manual the collation used defines the order of records when sorting alphabetically:

http://dev.mysql.com/doc/refman/5.0/en/charset-general.html

However: I have a PHP script (UTF-8) and I save some foreign characters in my MySQL database it's saved all weird (first row). This is when the collation I choose is latin1_swedish_ci. When I change the collation to utf8_unicode_ci all is good (second row).

upper row data saved with collation latin1_swedish_ci, lower row shows results after saving with utf8_unicode_ci

When saving this data everything is exactly the same except for the collation. So how about that "collation is used solely for sorting records"?

How someone can clarify this for me :-) Thanks in advance!

  • 写回答

1条回答 默认 最新

  • dpoh61610 2013-06-16 16:57
    关注

    It appears that the charset of your connection is not set right, therefore the conversion from the programming language charset to the database is not correct.

    You should set the charset in your connection, then both will workfine.

    as pointed out in the comments a little explanation on how things work.

    when you have not set the character set in your connections, the server assumes it to be the same as the collocation of the database. when data is recieved in a another encoding, the data is written nevertheless. just with wrong or other characters than they have been in the encoding of the data from the script.

    as long as nothing changes, the script gets back the same data as it has written and everything appears to be fine.

    however when either the connection encoding or the database encoding is changed at this point, the already stored data gets converted to the new encoding. the problem here is that the source data is not in the encoding that is assumend when converting.

    all encodings share the ascii set with the same bits, thats why ascii charactes dont mess up. only special charaters do.

    so you have to set your conneciton encoding in order to dont produce the mess that you are already in.

    now what can you do about the data you already have?

    you can make a dump of your database using mysqldump and use the --skip-set-charset option. then you get a plaintext file. in this plane text file replace all occurences of the actual database charset with the one the data is really in (the one you had in your script when you wrote the data).

    then save the file and make sure your editor does not do any conversion (i recommend vim).

    then import that file and you will get a database with data in the correct encoding. then you can change the encoding however you like and as long as your conneciton charset gets set also you will be fine from now on.

    also make sure that the mysql server has the charsets installed, but it should have that already.

    this is only my approach, i have cleaned up a lot of messed up installations like that. most of which at some point have garbled characters in their projects (after switching server, updating or restoring a backup...). turns out not setting the connection charset is something that is very often forgotten.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 目前主流的音乐软件,像网易云音乐,QQ音乐他们的前端和后台部分是用的什么技术实现的?求解!
  • ¥60 pb数据库修改与连接
  • ¥15 spss统计中二分类变量和有序变量的相关性分析可以用kendall相关分析吗?
  • ¥15 拟通过pc下指令到安卓系统,如果追求响应速度,尽可能无延迟,是不是用安卓模拟器会优于实体的安卓手机?如果是,可以快多少毫秒?
  • ¥20 神经网络Sequential name=sequential, built=False
  • ¥16 Qphython 用xlrd读取excel报错
  • ¥15 单片机学习顺序问题!!
  • ¥15 ikuai客户端多拨vpn,重启总是有个别重拨不上
  • ¥20 关于#anlogic#sdram#的问题,如何解决?(关键词-performance)
  • ¥15 相敏解调 matlab