doufeng1249 2014-07-23 07:18
浏览 116
已采纳

转:从[] byte转换为字符串,反之亦然

I always seem to be converting strings to []byte to string again over and over. Is there a lot of overhead with this? Is there a better way?

For example, here is a function that accepts a UTF8 string, normalizes it, remove accents, then converts special characters to ASCII equivalent:

var transliterations = map[rune]string{'Æ':"AE",'Ð':"D",'Ł':"L",'Ø':"OE",'Þ':"Th",'ß':"ss",'æ':"ae",'ð':"d",'ł':"l",'ø':"oe",'þ':"th",'Œ':"OE",'œ':"oe"}
func RemoveAccents(s string) string {
    b := make([]byte, len(s))
    t := transform.Chain(norm.NFD, transform.RemoveFunc(isMn), norm.NFC)
    _, _, e := t.Transform(b, []byte(s), true)
    if e != nil { panic(e) }
    r := string(b)

    var f bytes.Buffer
    for _, c := range r {
        temp := rune(c)
        if val, ok := transliterations[temp]; ok {
            f.WriteString(val)
        } else {
            f.WriteRune(temp)
        }
    }
    return f.String()
}

So I'm starting with a string because that's what I get, then I'm converting it to a byte array, then back to a string, then to a byte array again, then back to a string again. Surely this is unnecessary but I can't figure out how to not do this..? And does it really have a lot of overhead or do I not have to worry about slowing things down with excessive conversions?

(Also if anyone has the time I've not yet figured out how bytes.Buffer actually works, would it not be better to initialize a buffer of 2x the size of the string, which is the maximum output size of the return value?)

  • 写回答

3条回答 默认 最新

  • 普通网友 2014-07-23 16:39
    关注

    In Go, strings are immutable so any change creates a new string. As a general rule, convert from a string to a byte or rune slice once and convert back to a string once. To avoid reallocations, for small and transient allocations, over-allocate to provide a safety margin if you don't know the exact number.

    For example,

    package main
    
    import (
        "bytes"
        "fmt"
        "unicode"
        "unicode/utf8"
    
        "code.google.com/p/go.text/transform"
        "code.google.com/p/go.text/unicode/norm"
    )
    
    var isMn = func(r rune) bool {
        return unicode.Is(unicode.Mn, r) // Mn: nonspacing marks
    }
    
    var transliterations = map[rune]string{
        'Æ': "AE", 'Ð': "D", 'Ł': "L", 'Ø': "OE", 'Þ': "Th",
        'ß': "ss", 'æ': "ae", 'ð': "d", 'ł': "l", 'ø': "oe",
        'þ': "th", 'Œ': "OE", 'œ': "oe",
    }
    
    func RemoveAccents(b []byte) ([]byte, error) {
        mnBuf := make([]byte, len(b)*125/100)
        t := transform.Chain(norm.NFD, transform.RemoveFunc(isMn), norm.NFC)
        n, _, err := t.Transform(mnBuf, b, true)
        if err != nil {
            return nil, err
        }
        mnBuf = mnBuf[:n]
        tlBuf := bytes.NewBuffer(make([]byte, 0, len(mnBuf)*125/100))
        for i, w := 0, 0; i < len(mnBuf); i += w {
            r, width := utf8.DecodeRune(mnBuf[i:])
            if s, ok := transliterations[r]; ok {
                tlBuf.WriteString(s)
            } else {
                tlBuf.WriteRune(r)
            }
            w = width
        }
        return tlBuf.Bytes(), nil
    }
    
    func main() {
        in := "test stringß"
        fmt.Println(in)
        inBytes := []byte(in)
        outBytes, err := RemoveAccents(inBytes)
        if err != nil {
            fmt.Println(err)
        }
        out := string(outBytes)
        fmt.Println(out)
    }
    

    Output:

    test stringß
    test stringss
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥20 sub地址DHCP问题
  • ¥15 delta降尺度计算的一些细节,有偿
  • ¥15 Arduino红外遥控代码有问题
  • ¥15 数值计算离散正交多项式
  • ¥30 数值计算均差系数编程
  • ¥15 redis-full-check比较 两个集群的数据出错
  • ¥15 Matlab编程问题
  • ¥15 训练的多模态特征融合模型准确度很低怎么办
  • ¥15 kylin启动报错log4j类冲突
  • ¥15 超声波模块测距控制点灯,灯的闪烁很不稳定,经过调试发现测的距离偏大