duandiaoqian5795 2018-10-19 14:25
浏览 642

将utf-8转换为单字节编码

I have a batch of wrongfully encoded records. This one-liner gives me out a correct result

cat example.txt | iconv -f utf-8 -t iso8859-2

But the following program give me an error encoding: rune not supported by encoding.

func main() {
    s:= []byte {196, 144, 194, 154, 196, 144, 194, 176, 196, 144, 197, 186, 196, 144, 196, 190, 197, 131, 194, 128, 196, 144, 194, 176, 32, 52, 52, 53, 54, 50, 53, 54, 10, 10, 0, 0, }
    fmt.Println(s)

    dec := charmap.ISO8859_2.NewEncoder()
    out, err := dec.Bytes(s)
    if err != nil {
        fmt.Println(err)
        return
    }
    expectedOutput := "Камера 4456256"      
    fmt.Println("result", string(out), "expect:", expectedOutput)
}

I'm wondering if my problem can be resolved without iconv bindings ?

  • 写回答

1条回答 默认 最新

  • duanmu2013 2018-10-19 14:57
    关注

    Searching for charmap.ISO8859_2 gives the expression, that your are using golang.org/x/text.

    Here we see how the transformation is done, given a Charmap:

    https://github.com/golang/text/blob/4d1c5fb19474adfe9562c9847ba425e7da817e81/encoding/charmap/charmap.go#L206

    The specific line highlights where the error comes from. So your input contains characters in utf8 which can't be represented in iso8859-2 or invalid utf8.

    Here you see, that the error is handed to you faithfully and the usage of replacement inside the RepertoireError seems to be a red herring.

    Of course you don't need iconv bindings. You can just iterate through your input character by character and encode it as iso8859-2 and decide yourself, what to do with unrepresentable characters.

    评论

报告相同问题?

悬赏问题

  • ¥50 树莓派安卓APK系统签名
  • ¥15 maple软件,用solve求反函数出现rootof,怎么办?
  • ¥65 汇编语言除法溢出问题
  • ¥15 Visual Studio问题
  • ¥15 state显示变量是字符串形式,但是仍然红色,无法引用,并显示类型不匹配
  • ¥20 求一个html代码,有偿
  • ¥100 关于使用MATLAB中copularnd函数的问题
  • ¥20 在虚拟机的pycharm上
  • ¥15 jupyterthemes 设置完毕后没有效果
  • ¥15 matlab图像高斯低通滤波