dongliushui2001 2018-10-18 17:30
浏览 929
已采纳

如何在golang中使用表情符号处理(解码或删除无效的Unicode代码点)字符串?

Example string:

"\u0410\u043b\u0435\u043a\u0441\u0430\u043d\u0434\u0440\u044b! 
\u0421\u043f\u0430\u0441\u0438\u0431\u043e \ud83d\udcf8 link.ru \u0437\u0430 
#hashtag  Русское слово, an English word"

Without this \ud83d\udcf8 my func works well:

func convertUnicode(text string) string {
    s, err := strconv.Unquote(`"` + text + `"`)
    if err != nil {
        // Error.Printf("can't convert: %s | err: %s
", text, err)
        return text
    }
    return s
}

My question is how to detect that text contains this kind of entries? And how to convert it to emoji or how to remove from the text? Thanks

  • 写回答

1条回答 默认 最新

  • dsadsadsa1231 2018-10-18 20:44
    关注

    Well, probably not so simple as neither \ud83d nor \udcf8 are valid code points but together are a surrogate pair used in UTF-16 encoding to encode \U0001F4F8. Now strconv.Unquote will give you two surrogate halves which you have to combine yourself.

    1. Use strconv.Unquote to unquote as you did.
    2. Convert to []rune for convenience.
    3. Find surrogate pairs with unicode/utf16.IsSurrogate.
    4. Combine surrogate pairs with unicode/utf16.DecodeRune.
    5. Convert back to string.
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 PADS Logic 原理图
  • ¥15 PADS Logic 图标
  • ¥15 电脑和power bi环境都是英文如何将日期层次结构转换成英文
  • ¥20 气象站点数据求取中~
  • ¥15 如何获取APP内弹出的网址链接
  • ¥15 wifi 图标不见了 不知道怎么办 上不了网 变成小地球了
  • ¥50 STM32单片机传感器读取错误
  • ¥15 (关键词-阻抗匹配,HFSS,RFID标签天线)
  • ¥15 机器人轨迹规划相关问题
  • ¥15 word样式右侧翻页键消失