doushadu0901 2014-06-16 16:11
浏览 86
已采纳

MySQL,UTF-8和Emoji字符

I'm working on an iOS app with a PHP+MySQL backend. The app has a chat section, which needs to support emoji. My tables are utf8_unicode_ci. If I don't call 'set names utf8' in my scripts, emoji it actually works - whatever is entered in the database, is returned to the clients as it should.

The problem is that this (if I understand it correctly) stores special characters incorrectly in the database, and this breaks string comparing (ie ï is no longer the same as i when comparing strings).

However, if I do call set names utf8, suddenly the emoji characters are inserted as a bunch of questionmarks.

Any suggestions on the proper way of handling this? Thanks!

  • 写回答

1条回答 默认 最新

  • dongye1143 2014-06-16 16:48
    关注

    The issue is wether the db has a diacritical insensitive compare. The other issue is composed characters, ï can be expressed as either one unicode character or two forming a surrogate pair. There are methods to convert a string to a pre-composed or decomposed form: precomposedStringWith* and decomposedStringWith*.

    It seems that MySQL supports two forms of unicode ucs2 (that is an older form that was supersede by utf16) which is 16-bits per character and utf8 up to 3 bytes per character. The bad news is that neither form is going to support plane 1 characters which require at 17 bits. (mainly emoji). It looks like MySQL 5.5.3 and up also support utf8mb4, utf16, and utf32 support BMP and supplementary characters (read emoji). See MySQL Unicode Character Sets.

    Here is some code and results to demonstrate the different unicode byte representations.
    Unicode is a 21 bit encoding system.
    UTF32 directly represents the code points and clearly demonstrates decomposed surrogate pairs.
    UTF8 and UTF16 require one or more bytes to represent a unicode character.

    NSLog(@"character: %@", @"Å");
    NSLog(@"decomposedStringWithCanonicalMapping UTF8: %@", [[@"Å" decomposedStringWithCanonicalMapping] dataUsingEncoding:NSUTF8StringEncoding]);
    NSLog(@"decomposedStringWithCanonicalMapping UTF16: %@", [[@"Å" decomposedStringWithCanonicalMapping] dataUsingEncoding:NSUTF16BigEndianStringEncoding]);
    NSLog(@"decomposedStringWithCanonicalMapping UTF32: %@", [[@"Å" decomposedStringWithCanonicalMapping] dataUsingEncoding:NSUTF32BigEndianStringEncoding]);

    NSLog(@"precomposedStringWithCanonicalMapping UTF8: %@", [[@"Å" precomposedStringWithCanonicalMapping] dataUsingEncoding:NSUTF8StringEncoding]);
    NSLog(@"precomposedStringWithCanonicalMapping UTF16: %@", [[@"Å" precomposedStringWithCanonicalMapping] dataUsingEncoding:NSUTF16BigEndianStringEncoding]);
    NSLog(@"precomposedStringWithCanonicalMapping UTF32: %@", [[@"Å" precomposedStringWithCanonicalMapping] dataUsingEncoding:NSUTF32BigEndianStringEncoding]);

    NSLog(@"character: %@", @"

    展开全部

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
编辑
预览

报告相同问题?

悬赏问题

  • ¥15 TMUXHS4412如何防止静电,
  • ¥30 Metashape软件中如何将建模后的图像中的植被与庄稼点云删除
  • ¥20 机械振动学课后习题求解答
  • ¥15 IEC61850 客户端和服务端的通讯机制
  • ¥15 MAX98357a(关键词-播放音频)
  • ¥15 Linux误删文件,请求帮助
  • ¥15 IBMP550小型机使用串口登录操作系统
  • ¥15 关于#python#的问题:现已知七自由度机器人的DH参数,利用DH参数求解机器人的逆运动学解目前使用的PSO算法
  • ¥15 发那科机器人与设备通讯配置
  • ¥15 Linux环境下openssl报错
手机看
程序员都在用的中文IT技术交流社区

程序员都在用的中文IT技术交流社区

专业的中文 IT 技术社区,与千万技术人共成长

专业的中文 IT 技术社区,与千万技术人共成长

关注【CSDN】视频号,行业资讯、技术分享精彩不断,直播好礼送不停!

关注【CSDN】视频号,行业资讯、技术分享精彩不断,直播好礼送不停!

客服 返回
顶部