douziqian2871 2011-10-06 20:18
浏览 118
已采纳

utf8网站和latin1数据库表字段

I am using mysql database, innoDB and MyISAM both engines. I want to see the different between utf8 and latin1, so I did a test :
codes On website :

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

attribute of field on database table :

latin1_swedish_ci

Then I typed chinese words "我爱你" and "ĀāĂ㥹ĆćĈ" on the website and click submit button, php and mysql store the words into database. And mysql retrive value from database and display onto the website back.

Output :
website show "我爱你" and "ĀāĂ㥹ĆćĈ".
databse table field show "我爱ä½" and "Ā".

I did test all the following changes :
1) change meta of website to latin-1 and attribute of table field to utf-8
2) change meta of website to utf-8 and attribute of table field to utf-8 too.
3) change meta of website to latin-1 and attribute of table field to latin-1 too.
but the output is still the same, the output change nothing. Why?

Is that I can't test by that method? If so, how do I test the different between utf-8 and latin-1?
How to make database table field show the words "我爱你" and "ĀāĂ㥹ĆćĈ"?

I am developed a social networking website like Facebook.com which support multiple languages, should I use utf-8 for database field? disadvantage of utf-8 is it takes 3 bytes per character but latin-1 takes 1 byte only. To save the storage, it is better to use latin-1, but I am not sure what will be the future problem if I use latin-1 instead of utf-8. Can anyone give me some advice of how to decide which character set to be used?

  • 写回答

2条回答 默认 最新

  • douyinghuo8874 2011-10-06 21:25
    关注

    1) You may want to note that you can't really peak into a database record without using some kind of software, which may bring to the show its own bag of issues. Phpmyadmin has a character set config option, software products have an internally assumed character set, even a command prompt window has a codepage. The important thing is to make sure you get back from the database exactly what you've put in it, not how is it stored in the tablespace. Use "SET NAMES character-set" to keep consistent charset thru the whole connection.

    2) UTF-8 is clearly where the world is moving to, because it works and because it can store characters from every language (writing system) you're likely to encounter. With latin-1 you are cutting out every language which is not from Western Europe - this means not only chinese and cyrillic and greek/hebrew and such, but also East Europe, Turkey and a lot of other places which basically use latin alphabets with some particular letter added on.

    3) UFT-8 is, and is expected to be in future, by large the most future-proof solution.

    4) It's much safer (and saner), ever for monolingual applications, to do the right thing from the start (that would be UTF-8), rather than have to convert your multi-gygabyte tables later on, when you find out you need more. Nobody who had to do such a thing liked the experience.

    5) Disk space is a commodity, cheaper by the day - if you are about to do 'social' you should just take a boatload of it ( if the thing flies you'll need it anyway) and forget about it, there are other issues that will bite you much sooner than disk: performance under load, access concurrency, clustering and load balancing multiple server. I can't remember a single social network lamenting issues because of those 3 bytes.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥20 腾讯企业邮箱邮件可以恢复么
  • ¥15 有人知道怎么将自己的迁移策略布到edgecloudsim上使用吗?
  • ¥15 错误 LNK2001 无法解析的外部符号
  • ¥50 安装pyaudiokits失败
  • ¥15 计组这些题应该咋做呀
  • ¥60 更换迈创SOL6M4AE卡的时候,驱动要重新装才能使用,怎么解决?
  • ¥15 让node服务器有自动加载文件的功能
  • ¥15 jmeter脚本回放有的是对的有的是错的
  • ¥15 r语言蛋白组学相关问题
  • ¥15 Python时间序列如何拟合疏系数模型