dongzhenju3015 2017-12-05 18:14
浏览 1269

在Go中将UTF-8转换为ISO8859-1的最佳方法

I'm trying to map UTF-8 characters to their "similar" ISO8859-1 representation. Removing diacritics, but also replacing characters like Ł with L or ı with i.

Example: José Kakışır should become Jose Kakisir.

I'm aware that removing diacritics can be done this way:

// (From https://blog.golang.org/normalization#TOC_10.)
import (
    "unicode"

    "golang.org/x/text/transform"
    "golang.org/x/text/unicode/norm"
)

isMn := func(r rune) bool {
    return unicode.Is(unicode.Mn, r) // Mn: nonspacing marks
}
t := transform.Chain(norm.NFD, transform.RemoveFunc(isMn), norm.NFC)
result, _, err := transform.String(t, "José Kakışır")
println(result)

Which prints out Jose Karısır - replaced with s, but ı not replaced with i.

What's the best way to achieve that in Go?

  • 写回答

1条回答 默认 最新

  • doujie7497 2017-12-05 18:26
    关注

    I believe the charmap package does what you want with a charmap.ISO8859_1.NewEncoder()

    Edit: nevermind, that will barf on unsupported runes. Sorry. It may be worth looking into this package some more though.

    Ultimately, it feels like you will need to find (or create) a mapping from UTF-8 to ISO8859. I don't think you'll find a "standard" one out there though, the mapping is too arbitrary.

    评论

报告相同问题?

悬赏问题

  • ¥100 set_link_state
  • ¥15 虚幻5 UE美术毛发渲染
  • ¥15 CVRP 图论 物流运输优化
  • ¥15 Tableau online 嵌入ppt失败
  • ¥100 支付宝网页转账系统不识别账号
  • ¥15 基于单片机的靶位控制系统
  • ¥15 真我手机蓝牙传输进度消息被关闭了,怎么打开?(关键词-消息通知)
  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度