dosro793520628
dosro793520628
2012-11-22 10:18
浏览 302
已采纳

golang将iso8859-1转换为utf8

I am trying to convert an ISO 8859-1 encoded string to UTF-8.

The following function works with my testdata which contains german umlauts, but I'm not quite sure what source encoding the rune(b) cast assumes. Is it assuming some kind of default encoding, e.g. ISO8859-1 or is there any way to tell it what encoding to use?

func toUtf8(iso8859_1_buf []byte) string {
   var buf = bytes.NewBuffer(make([]byte, len(iso8859_1_buf)*4))
   for _, b := range(iso8859_1_buf) {
      r := rune(b)
      buf.WriteRune(r)
   }
   return string(buf.Bytes())
}
  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 邀请回答

2条回答 默认 最新

  • douruanfan3030
    douruanfan3030 2012-11-22 11:11
    已采纳

    rune is an alias for int32, and when it comes to encoding, a rune is assumed to have a Unicode character value (code point). So the value b in rune(b) should be a unicode value. For 0x00 - 0xFF this value is identical to Latin-1, so you don't have to worry about it.

    Then you need to encode the runes into UTF8. But this encoding is simply done by converting a []rune to string.

    This is an example of your function without using the bytes package:

    func toUtf8(iso8859_1_buf []byte) string {
        buf := make([]rune, len(iso8859_1_buf))
        for i, b := range iso8859_1_buf {
            buf[i] = rune(b)
        }
        return string(buf)
    }
    
    点赞 评论
  • dongnuo6310
    dongnuo6310 2012-11-22 11:16

    The effect of

    r := rune(expression)
    

    is:

    • Declare variable r with type rune (alias for int32).
    • Initialize variable r with the value of expresion.

    No (re)encoding is involved and saying which one should be optionally used is possible only by explicitly writing/handling some re-encoding in code. Luckily, in this case no (re)encoding is necessary, Unicode incorporated those codes of ISO 8859-1 in a comparable way as ASCII. (If I checked correctly here)

    点赞 评论

相关推荐