douchun1859 2015-05-31 10:56
浏览 45
已采纳

字符串到UCS-2

I want to translate in Go my python program to convert an unicode string to a UCS-2 HEX string.

In python, it's quite simple:

u"Bien joué".encode('utf-16-be').encode('hex')
-> 004200690065006e0020006a006f007500e9

I am a beginner in Go and the simplest way I found is:

package main

import (
    "fmt"
    "strings"
)

func main() {
    str := "Bien joué" 
    fmt.Printf("str: %s
", str)

    ucs2HexArray := []rune(str)
    s := fmt.Sprintf("%U", ucs2HexArray)
    a := strings.Replace(s, "U+", "", -1)
    b := strings.Replace(a, "[", "", -1)
    c := strings.Replace(b, "]", "", -1)
    d := strings.Replace(c, " ", "", -1)
    fmt.Printf("->: %s", d)
}

str: Bien joué
->: 004200690065006E0020006A006F007500E9
Program exited.

I really think it's clearly not efficient. How can-I improve it?

Thank you

  • 写回答

3条回答 默认 最新

  • douzhulan1815 2015-05-31 13:24
    关注

    Make this conversion a function then you can easily improve the conversion algorithm in the future. For example,

    package main
    
    import (
        "fmt"
        "strings"
        "unicode/utf16"
    )
    
    func hexUTF16FromString(s string) string {
        hex := fmt.Sprintf("%04x", utf16.Encode([]rune(s)))
        return strings.Replace(hex[1:len(hex)-1], " ", "", -1)
    }
    
    func main() {
        str := "Bien joué"
        fmt.Println(str)
        hex := hexUTF16FromString(str)
        fmt.Println(hex)
    }
    

    Output:

    Bien joué
    004200690065006e0020006a006f007500e9
    

    NOTE:

    You say "convert an unicode string to a UCS-2 string" but your Python example uses UTF-16:

    u"Bien joué".encode('utf-16-be').encode('hex')
    

    The Unicode Consortium

    UTF-16 FAQ

    Q: What is the difference between UCS-2 and UTF-16?

    A: UCS-2 is obsolete terminology which refers to a Unicode implementation up to Unicode 1.1, before surrogate code points and UTF-16 were added to Version 2.0 of the standard. This term should now be avoided.

    UCS-2 does not describe a data format distinct from UTF-16, because both use exactly the same 16-bit code unit representations. However, UCS-2 does not interpret surrogate code points, and thus cannot be used to conformantly represent supplementary characters.

    Sometimes in the past an implementation has been labeled "UCS-2" to indicate that it does not support supplementary characters and doesn't interpret pairs of surrogate code points as characters. Such an implementation would not handle processing of character properties, code point boundaries, collation, etc. for supplementary characters.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 素材场景中光线烘焙后灯光失效
  • ¥15 请教一下各位,为什么我这个没有实现模拟点击
  • ¥15 执行 virtuoso 命令后,界面没有,cadence 启动不起来
  • ¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
  • ¥20 有关区间dp的问题求解
  • ¥15 多电路系统共用电源的串扰问题
  • ¥15 slam rangenet++配置
  • ¥15 有没有研究水声通信方面的帮我改俩matlab代码
  • ¥15 ubuntu子系统密码忘记
  • ¥15 保护模式-系统加载-段寄存器