I'm using the golang.org/x/text/unicode/norm
package to iterate over runes in a []byte
. I've chosen this approach as I need to inspect each rune and maintain information about the sequence of runes. The last call to iter.Next()
does not read the last rune. It gives 0 bytes read on the last rune.
Here is the code:
package main
import (
"fmt"
"unicode/utf8"
"golang.org/x/text/unicode/norm"
)
func main() {
var (
n int
r rune
it norm.Iter
out []byte
)
in := []byte(`test`)
fmt.Printf("%s
", in)
fmt.Println(in)
it.Init(norm.NFD, in)
for !it.Done() {
ruf := it.Next()
r, n = utf8.DecodeRune(ruf)
fmt.Printf("bytes read: %d. val: %q
", n, r)
buf := make([]byte, utf8.RuneLen(r))
utf8.EncodeRune(buf, r)
out = norm.NFC.Append(out, buf...)
}
fmt.Printf("%s
", out)
fmt.Println(out)
}
This produces the following output:
test
[116 101 115 116]
bytes read: 1. val: 't'
bytes read: 1. val: 'e'
bytes read: 1. val: 's'
bytes read: 0. val: '�'
tes�
[116 101 115 239 191 189]