doushadu0901 2014-06-17 00:11
浏览 86
已采纳

MySQL,UTF-8和Emoji字符

I'm working on an iOS app with a PHP+MySQL backend. The app has a chat section, which needs to support emoji. My tables are utf8_unicode_ci. If I don't call 'set names utf8' in my scripts, emoji it actually works - whatever is entered in the database, is returned to the clients as it should.

The problem is that this (if I understand it correctly) stores special characters incorrectly in the database, and this breaks string comparing (ie ï is no longer the same as i when comparing strings).

However, if I do call set names utf8, suddenly the emoji characters are inserted as a bunch of questionmarks.

Any suggestions on the proper way of handling this? Thanks!

  • 写回答

1条回答 默认 最新

  • dongye1143 2014-06-17 00:48
    关注

    The issue is wether the db has a diacritical insensitive compare. The other issue is composed characters, ï can be expressed as either one unicode character or two forming a surrogate pair. There are methods to convert a string to a pre-composed or decomposed form: precomposedStringWith* and decomposedStringWith*.

    It seems that MySQL supports two forms of unicode ucs2 (that is an older form that was supersede by utf16) which is 16-bits per character and utf8 up to 3 bytes per character. The bad news is that neither form is going to support plane 1 characters which require at 17 bits. (mainly emoji). It looks like MySQL 5.5.3 and up also support utf8mb4, utf16, and utf32 support BMP and supplementary characters (read emoji). See MySQL Unicode Character Sets.

    Here is some code and results to demonstrate the different unicode byte representations.
    Unicode is a 21 bit encoding system.
    UTF32 directly represents the code points and clearly demonstrates decomposed surrogate pairs.
    UTF8 and UTF16 require one or more bytes to represent a unicode character.

    NSLog(@"character: %@", @"Å");
    NSLog(@"decomposedStringWithCanonicalMapping UTF8: %@", [[@"Å" decomposedStringWithCanonicalMapping] dataUsingEncoding:NSUTF8StringEncoding]);
    NSLog(@"decomposedStringWithCanonicalMapping UTF16: %@", [[@"Å" decomposedStringWithCanonicalMapping] dataUsingEncoding:NSUTF16BigEndianStringEncoding]);
    NSLog(@"decomposedStringWithCanonicalMapping UTF32: %@", [[@"Å" decomposedStringWithCanonicalMapping] dataUsingEncoding:NSUTF32BigEndianStringEncoding]);

    NSLog(@"precomposedStringWithCanonicalMapping UTF8: %@", [[@"Å" precomposedStringWithCanonicalMapping] dataUsingEncoding:NSUTF8StringEncoding]);
    NSLog(@"precomposedStringWithCanonicalMapping UTF16: %@", [[@"Å" precomposedStringWithCanonicalMapping] dataUsingEncoding:NSUTF16BigEndianStringEncoding]);
    NSLog(@"precomposedStringWithCanonicalMapping UTF32: %@", [[@"Å" precomposedStringWithCanonicalMapping] dataUsingEncoding:NSUTF32BigEndianStringEncoding]);

    NSLog(@"character: %@", @"

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 如何在scanpy上做差异基因和通路富集?
  • ¥20 关于#硬件工程#的问题,请各位专家解答!
  • ¥15 关于#matlab#的问题:期望的系统闭环传递函数为G(s)=wn^2/s^2+2¢wn+wn^2阻尼系数¢=0.707,使系统具有较小的超调量
  • ¥15 FLUENT如何实现在堆积颗粒的上表面加载高斯热源
  • ¥30 截图中的mathematics程序转换成matlab
  • ¥15 动力学代码报错,维度不匹配
  • ¥15 Power query添加列问题
  • ¥50 Kubernetes&Fission&Eleasticsearch
  • ¥15 報錯:Person is not mapped,如何解決?
  • ¥15 c++头文件不能识别CDialog