dongquanjie9328 2016-05-31 00:43
浏览 234
已采纳

由于表情符号,Go在JSON输出中生成了未转义的控制字符

I'm having trouble with something in Go and I'm not sure where to look. I'm fetching a UTF-8 string from a MySQL database, and attempting to return it in a JSON response to a client.

Different clients react differently, but iOS NSJSONSerialization returns an "Unescaped control character" error. This breaks the whole application. I can decode the JSON without issue in Chrome using JSON.parse(), though.

On the server-side, this same generator function written in another language besides Go works fine. Help?


EDIT: Here is the JSON that is causing the issue:

{ "test":"☮️" }

... If I omit this emoji, it works. If it's there, it doesn't work. The issue seems to be something related to there being two different encodings for certain emoji. One seems to trip up Go, but they are both valid.

To demonstrate the difference in encoding, some of the emoji show up in the database explorer and some do not:

screenshot

... These ones that appear in the database explorer are causing this issue with 100% reproducibility. However, all of them usually appear in the actual client software (not the database explorer) without issue. I don't know if there's a way to reconfigure the database connection to avoid this (or something), but it seems to work with different instances depending on what is doing the decoding and how forgiving it is. Considering that users could type or copy/paste either encoding... this needs to work consistently.

Any help would be appreciated. Thanks in advance.

  • 写回答

1条回答 默认 最新

  • duanmu1736 2016-05-31 02:38
    关注

    Go is doing fine.

    fmt.Println([]byte("☮️"))
    //[226 152 174 239 184 143]
    //Yup, 1 character - 6 bytes.
    

    NSJSONSerialization cant handle this. May be this link will be helpful NSJSONSerialization and Emoji. It's something about NSData * utf32Data = [uniText dataUsingEncoding:NSUTF32LittleEndianStringEncoding];. blah

    Can you give us byte representation of "☮️" simbol in "iOS style", like i did with go?

    UPD

    I made some research, looks like something wrong with your database encoding. Is it UTF16?

    Check this out

    // it look the same, but completely different "characters"
    //first one is yours, and second one is U+262E
    const nihongo = "☮️☮"
    for index, runeValue := range nihongo {
            fmt.Printf("%#U starts at byte position %d
    ", runeValue, index)
    }
    bad := []byte("☮️")
    good := []byte("☮")
    fmt.Printf("%v %s 
    ", bad, bad)
    fmt.Printf("%v %s 
    ", good, good)
    

    Output:

    U+262E '☮' starts at byte position 0
    U+FE0F '️' starts at byte position 3
    U+262E '☮' starts at byte position 6
    [226 152 174 239 184 143] ☮️ 
    [226 152 174] ☮ 
    

    UDP2

    It just hit me! I was doing ctrl+c/ctrl+v all the way with your symbol. But it is not a single symbol! Its 2 symbols and second one is unprintable.

    unprintable := []byte{239, 184, 143}
    fmt.Printf("valid? %v", utf8.Valid(unprintable))
    fmt.Println("full rune?", utf8.FullRune(unprintable))
    r, size := utf8.DecodeRune(unprintable)
    fmt.Println(r, size, string(r))
    fmt.Printf("valid rune? #v", utf8.ValidRune(r))
    

    Output:

    valid? true
    full rune? true
    65039 3 ️
    valid rune? true
    

    So, your db is fine, unprintable "character" is fine, but NSJSONSerialization can not handle it. Better to ask iOS community =)

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 RPA正常跑,cmd输入cookies跑不出来
  • ¥15 求帮我调试一下freefem代码
  • ¥15 matlab代码解决,怎么运行
  • ¥15 R语言Rstudio突然无法启动
  • ¥15 关于#matlab#的问题:提取2个图像的变量作为另外一个图像像元的移动量,计算新的位置创建新的图像并提取第二个图像的变量到新的图像
  • ¥15 改算法,照着压缩包里边,参考其他代码封装的格式 写到main函数里
  • ¥15 用windows做服务的同志有吗
  • ¥60 求一个简单的网页(标签-安全|关键词-上传)
  • ¥35 lstm时间序列共享单车预测,loss值优化,参数优化算法
  • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。