dqyl2374
2015-09-11 07:58
浏览 1.5k
已采纳

如何在Go中从编码转换为UTF-8?

I'm working on a project where I need to convert text from an encoding (for example Windows-1256 Arabic) to UTF-8.

How do I do this in Go?

图片转代码服务由CSDN问答提供 功能建议

我正在研究一个项目,该项目需要从编码转换文本(例如Windows-1256阿拉伯语) 到UTF-8。

如何在Go中执行此操作?

  • 写回答
  • 关注问题
  • 收藏
  • 邀请回答

2条回答 默认 最新

  • duanhuayong6687 2015-09-11 08:13
    已采纳

    You can use the encoding package, which includes support for Windows-1256 via the package golang.org/x/text/encoding/charmap (in the example below, import this package and use charmap.Windows1256 instead of japanese.ShiftJIS).

    Here's a short example which encodes a japanese UTF-8 string to ShiftJIS encoding and then decodes the ShiftJIS string back to UTF-8. Unfortunately it doesn't work on the playground since the playground doesn't have the "x" packages.

    package main
    
    import (
        "bytes"
        "fmt"
        "io/ioutil"
        "strings"
    
        "golang.org/x/text/encoding/japanese"
        "golang.org/x/text/transform"
    )
    
    func main() {
        // the string we want to transform
        s := "今日は"
        fmt.Println(s)
    
        // --- Encoding: convert s from UTF-8 to ShiftJIS 
        // declare a bytes.Buffer b and an encoder which will write into this buffer
        var b bytes.Buffer
        wInUTF8 := transform.NewWriter(&b, japanese.ShiftJIS.NewEncoder())
        // encode our string
        wInUTF8.Write([]byte(s))
        wInUTF8.Close()
        // print the encoded bytes
        fmt.Printf("%#v
    ", b)
        encS := b.String()
        fmt.Println(encS)
    
        // --- Decoding: convert encS from ShiftJIS to UTF8
        // declare a decoder which reads from the string we have just encoded
        rInUTF8 := transform.NewReader(strings.NewReader(encS), japanese.ShiftJIS.NewDecoder())
        // decode our string
        decBytes, _ := ioutil.ReadAll(rInUTF8)
        decS := string(decBytes)
        fmt.Println(decS)
    }
    

    There's a more complete example on the Japanese StackOverflow site. The text is Japanese, but the code should be self-explanatory: https://ja.stackoverflow.com/questions/6120

    打赏 评论
  • duandaishi9268 2015-09-11 09:25

    Use modules from golang.org/x/text. In your case this would be something like:

    b := /* Win1256 bytes here. */
    dec := charmap.Windows1256.NewDecoder()
    // Take more space just in case some characters need
    // more bytes in UTF-8 than in Win1256.
    bUTF := make([]byte, len(b)*3)
    n, _, err := dec.Transform(bUTF, b, false)
    if err != nil {
        panic(err)
    }
    bUTF = bUTF[:n]
    
    打赏 评论

相关推荐 更多相似问题