I'm trying to map UTF-8 characters to their "similar" ISO8859-1 representation. Removing diacritics, but also replacing characters like Ł
with L
or ı
with i
.
Example:
José Kakışır
should become Jose Kakisir
.
I'm aware that removing diacritics can be done this way:
// (From https://blog.golang.org/normalization#TOC_10.)
import (
"unicode"
"golang.org/x/text/transform"
"golang.org/x/text/unicode/norm"
)
isMn := func(r rune) bool {
return unicode.Is(unicode.Mn, r) // Mn: nonspacing marks
}
t := transform.Chain(norm.NFD, transform.RemoveFunc(isMn), norm.NFC)
result, _, err := transform.String(t, "José Kakışır")
println(result)
Which prints out Jose Karısır
- ş
replaced with s
, but ı
not replaced with i
.
What's the best way to achieve that in Go?