dongquanjie9328 2016-05-31 00:43
浏览 234
已采纳

由于表情符号,Go在JSON输出中生成了未转义的控制字符

I'm having trouble with something in Go and I'm not sure where to look. I'm fetching a UTF-8 string from a MySQL database, and attempting to return it in a JSON response to a client.

Different clients react differently, but iOS NSJSONSerialization returns an "Unescaped control character" error. This breaks the whole application. I can decode the JSON without issue in Chrome using JSON.parse(), though.

On the server-side, this same generator function written in another language besides Go works fine. Help?


EDIT: Here is the JSON that is causing the issue:

{ "test":"☮️" }

... If I omit this emoji, it works. If it's there, it doesn't work. The issue seems to be something related to there being two different encodings for certain emoji. One seems to trip up Go, but they are both valid.

To demonstrate the difference in encoding, some of the emoji show up in the database explorer and some do not:

screenshot

... These ones that appear in the database explorer are causing this issue with 100% reproducibility. However, all of them usually appear in the actual client software (not the database explorer) without issue. I don't know if there's a way to reconfigure the database connection to avoid this (or something), but it seems to work with different instances depending on what is doing the decoding and how forgiving it is. Considering that users could type or copy/paste either encoding... this needs to work consistently.

Any help would be appreciated. Thanks in advance.

  • 写回答

1条回答 默认 最新

  • duanmu1736 2016-05-31 02:38
    关注

    Go is doing fine.

    fmt.Println([]byte("☮️"))
    //[226 152 174 239 184 143]
    //Yup, 1 character - 6 bytes.
    

    NSJSONSerialization cant handle this. May be this link will be helpful NSJSONSerialization and Emoji. It's something about NSData * utf32Data = [uniText dataUsingEncoding:NSUTF32LittleEndianStringEncoding];. blah

    Can you give us byte representation of "☮️" simbol in "iOS style", like i did with go?

    UPD

    I made some research, looks like something wrong with your database encoding. Is it UTF16?

    Check this out

    // it look the same, but completely different "characters"
    //first one is yours, and second one is U+262E
    const nihongo = "☮️☮"
    for index, runeValue := range nihongo {
            fmt.Printf("%#U starts at byte position %d
    ", runeValue, index)
    }
    bad := []byte("☮️")
    good := []byte("☮")
    fmt.Printf("%v %s 
    ", bad, bad)
    fmt.Printf("%v %s 
    ", good, good)
    

    Output:

    U+262E '☮' starts at byte position 0
    U+FE0F '️' starts at byte position 3
    U+262E '☮' starts at byte position 6
    [226 152 174 239 184 143] ☮️ 
    [226 152 174] ☮ 
    

    UDP2

    It just hit me! I was doing ctrl+c/ctrl+v all the way with your symbol. But it is not a single symbol! Its 2 symbols and second one is unprintable.

    unprintable := []byte{239, 184, 143}
    fmt.Printf("valid? %v", utf8.Valid(unprintable))
    fmt.Println("full rune?", utf8.FullRune(unprintable))
    r, size := utf8.DecodeRune(unprintable)
    fmt.Println(r, size, string(r))
    fmt.Printf("valid rune? #v", utf8.ValidRune(r))
    

    Output:

    valid? true
    full rune? true
    65039 3 ️
    valid rune? true
    

    So, your db is fine, unprintable "character" is fine, but NSJSONSerialization can not handle it. Better to ask iOS community =)

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥17 pro*C预编译“闪回查询”报错SCN不能识别
  • ¥15 微信会员卡接入微信支付商户号收款
  • ¥15 如何获取烟草零售终端数据
  • ¥15 数学建模招标中位数问题
  • ¥15 phython路径名过长报错 不知道什么问题
  • ¥15 深度学习中模型转换该怎么实现
  • ¥15 HLs设计手写数字识别程序编译通不过
  • ¥15 Stata外部命令安装问题求帮助!
  • ¥15 从键盘随机输入A-H中的一串字符串,用七段数码管方法进行绘制。提交代码及运行截图。
  • ¥15 TYPCE母转母,插入认方向