douchun1859 2015-05-31 02:56
浏览 45
已采纳

字符串到UCS-2

I want to translate in Go my python program to convert an unicode string to a UCS-2 HEX string.

In python, it's quite simple:

  1. u"Bien joué".encode('utf-16-be').encode('hex')
  2. -> 004200690065006e0020006a006f007500e9

I am a beginner in Go and the simplest way I found is:

  1. package main
  2. import (
  3. "fmt"
  4. "strings"
  5. )
  6. func main() {
  7. str := "Bien joué"
  8. fmt.Printf("str: %s
  9. ", str)
  10. ucs2HexArray := []rune(str)
  11. s := fmt.Sprintf("%U", ucs2HexArray)
  12. a := strings.Replace(s, "U+", "", -1)
  13. b := strings.Replace(a, "[", "", -1)
  14. c := strings.Replace(b, "]", "", -1)
  15. d := strings.Replace(c, " ", "", -1)
  16. fmt.Printf("->: %s", d)
  17. }
  18. str: Bien joué
  19. ->: 004200690065006E0020006A006F007500E9
  20. Program exited.

I really think it's clearly not efficient. How can-I improve it?

Thank you

展开全部

  • 写回答

3条回答 默认 最新

  • douzhulan1815 2015-05-31 05:24
    关注

    Make this conversion a function then you can easily improve the conversion algorithm in the future. For example,

    1. package main
    2. import (
    3. "fmt"
    4. "strings"
    5. "unicode/utf16"
    6. )
    7. func hexUTF16FromString(s string) string {
    8. hex := fmt.Sprintf("%04x", utf16.Encode([]rune(s)))
    9. return strings.Replace(hex[1:len(hex)-1], " ", "", -1)
    10. }
    11. func main() {
    12. str := "Bien joué"
    13. fmt.Println(str)
    14. hex := hexUTF16FromString(str)
    15. fmt.Println(hex)
    16. }

    Output:

    1. Bien joué
    2. 004200690065006e0020006a006f007500e9

    NOTE:

    You say "convert an unicode string to a UCS-2 string" but your Python example uses UTF-16:

    u"Bien joué".encode('utf-16-be').encode('hex')
    

    The Unicode Consortium

    UTF-16 FAQ

    Q: What is the difference between UCS-2 and UTF-16?

    A: UCS-2 is obsolete terminology which refers to a Unicode implementation up to Unicode 1.1, before surrogate code points and UTF-16 were added to Version 2.0 of the standard. This term should now be avoided.

    UCS-2 does not describe a data format distinct from UTF-16, because both use exactly the same 16-bit code unit representations. However, UCS-2 does not interpret surrogate code points, and thus cannot be used to conformantly represent supplementary characters.

    Sometimes in the past an implementation has been labeled "UCS-2" to indicate that it does not support supplementary characters and doesn't interpret pairs of surrogate code points as characters. Such an implementation would not handle processing of character properties, code point boundaries, collation, etc. for supplementary characters.

    展开全部

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)
编辑
预览

报告相同问题?

手机看
程序员都在用的中文IT技术交流社区

程序员都在用的中文IT技术交流社区

专业的中文 IT 技术社区,与千万技术人共成长

专业的中文 IT 技术社区,与千万技术人共成长

关注【CSDN】视频号,行业资讯、技术分享精彩不断,直播好礼送不停!

关注【CSDN】视频号,行业资讯、技术分享精彩不断,直播好礼送不停!

客服 返回
顶部