doufeng1249 2014-07-23 07:18
浏览 116
已采纳

转:从[] byte转换为字符串,反之亦然

I always seem to be converting strings to []byte to string again over and over. Is there a lot of overhead with this? Is there a better way?

For example, here is a function that accepts a UTF8 string, normalizes it, remove accents, then converts special characters to ASCII equivalent:

var transliterations = map[rune]string{'Æ':"AE",'Ð':"D",'Ł':"L",'Ø':"OE",'Þ':"Th",'ß':"ss",'æ':"ae",'ð':"d",'ł':"l",'ø':"oe",'þ':"th",'Œ':"OE",'œ':"oe"}
func RemoveAccents(s string) string {
    b := make([]byte, len(s))
    t := transform.Chain(norm.NFD, transform.RemoveFunc(isMn), norm.NFC)
    _, _, e := t.Transform(b, []byte(s), true)
    if e != nil { panic(e) }
    r := string(b)

    var f bytes.Buffer
    for _, c := range r {
        temp := rune(c)
        if val, ok := transliterations[temp]; ok {
            f.WriteString(val)
        } else {
            f.WriteRune(temp)
        }
    }
    return f.String()
}

So I'm starting with a string because that's what I get, then I'm converting it to a byte array, then back to a string, then to a byte array again, then back to a string again. Surely this is unnecessary but I can't figure out how to not do this..? And does it really have a lot of overhead or do I not have to worry about slowing things down with excessive conversions?

(Also if anyone has the time I've not yet figured out how bytes.Buffer actually works, would it not be better to initialize a buffer of 2x the size of the string, which is the maximum output size of the return value?)

  • 写回答

3条回答 默认 最新

  • 普通网友 2014-07-23 16:39
    关注

    In Go, strings are immutable so any change creates a new string. As a general rule, convert from a string to a byte or rune slice once and convert back to a string once. To avoid reallocations, for small and transient allocations, over-allocate to provide a safety margin if you don't know the exact number.

    For example,

    package main
    
    import (
        "bytes"
        "fmt"
        "unicode"
        "unicode/utf8"
    
        "code.google.com/p/go.text/transform"
        "code.google.com/p/go.text/unicode/norm"
    )
    
    var isMn = func(r rune) bool {
        return unicode.Is(unicode.Mn, r) // Mn: nonspacing marks
    }
    
    var transliterations = map[rune]string{
        'Æ': "AE", 'Ð': "D", 'Ł': "L", 'Ø': "OE", 'Þ': "Th",
        'ß': "ss", 'æ': "ae", 'ð': "d", 'ł': "l", 'ø': "oe",
        'þ': "th", 'Œ': "OE", 'œ': "oe",
    }
    
    func RemoveAccents(b []byte) ([]byte, error) {
        mnBuf := make([]byte, len(b)*125/100)
        t := transform.Chain(norm.NFD, transform.RemoveFunc(isMn), norm.NFC)
        n, _, err := t.Transform(mnBuf, b, true)
        if err != nil {
            return nil, err
        }
        mnBuf = mnBuf[:n]
        tlBuf := bytes.NewBuffer(make([]byte, 0, len(mnBuf)*125/100))
        for i, w := 0, 0; i < len(mnBuf); i += w {
            r, width := utf8.DecodeRune(mnBuf[i:])
            if s, ok := transliterations[r]; ok {
                tlBuf.WriteString(s)
            } else {
                tlBuf.WriteRune(r)
            }
            w = width
        }
        return tlBuf.Bytes(), nil
    }
    
    func main() {
        in := "test stringß"
        fmt.Println(in)
        inBytes := []byte(in)
        outBytes, err := RemoveAccents(inBytes)
        if err != nil {
            fmt.Println(err)
        }
        out := string(outBytes)
        fmt.Println(out)
    }
    

    Output:

    test stringß
    test stringss
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 C#调用python代码(python带有库)
  • ¥15 矩阵加法的规则是两个矩阵中对应位置的数的绝对值进行加和
  • ¥15 活动选择题。最多可以参加几个项目?
  • ¥15 飞机曲面部件如机翼,壁板等具体的孔位模型
  • ¥15 vs2019中数据导出问题
  • ¥20 云服务Linux系统TCP-MSS值修改?
  • ¥20 关于#单片机#的问题:项目:使用模拟iic与ov2640通讯环境:F407问题:读取的ID号总是0xff,自己调了调发现在读从机数据时,SDA线上并未有信号变化(语言-c语言)
  • ¥20 怎么在stm32门禁成品上增加查询记录功能
  • ¥15 Source insight编写代码后使用CCS5.2版本import之后,代码跳到注释行里面
  • ¥50 NT4.0系统 STOP:0X0000007B