dongshi1207 2018-06-15 15:15
浏览 1163
已采纳

Golang中的字符串转换和Unicode

I am reading Go Essentials:

String in Go is an immutable sequence of bytes (8-bit byte values) This is different than languages like Python, C#, Java or Swift where strings are Unicode.

I am playing around with following code:

s := "日本語"
b :=[]byte{0xe6, 0x97, 0xa5, 0xe6, 0x9c, 0xac, 0xe8, 0xaa, 0x9e}
fmt.Println(string(b) == s) // true

for i, runeChar := range b {
    fmt.Printf("byte position %d: %#U
", i, runeChar)
}

//byte position 0: U+00E6 'æ'
//byte position 1: U+0097
//byte position 2: U+00A5 '¥'
//byte position 3: U+00E6 'æ'
//byte position 4: U+009C
//byte position 5: U+00AC '¬'
//byte position 6: U+00E8 'è'
//byte position 7: U+00AA 'ª'
//byte position 8: U+009E

for i, runeChar := range string(b) {
    fmt.Printf("byte position %d: %#U
", i, runeChar)
}

//byte position 0: U+65E5 '日'
//byte position 3: U+672C '本'
//byte position 6: U+8A9E '語'

Questions:

  1. From where does Golang get Unicode for encoding byte array when custing to string? How does rune form? Does Golang compilator get Unicode from text file encoding during compilation?

  2. What are advantages and disadvantages of implementing String like a byte array, instead of utf-16 chars array like in Java?

  • 写回答

1条回答 默认 最新

报告相同问题?