doutang3760 2015-04-16 21:55
浏览 83
已采纳

在Go中使用bufio扫描仪读取unicode字符

I'm trying to read a plain text file that contains names like this: "CASTAÑEDA"

The code is basically like this:

file, err := os.Open("C:/Files/file.txt")
defer file.Close()
if err != nil {
    log.Fatal(err)
}
scanner := bufio.NewScanner(file)
for scanner.Scan() {
    fmt.Println(scanner.Text())
}

Then, when "CASTAÑEDA" is read it prints "CASTA�EDA"

There's any way to handle that characters when reading with bufio?

Thanks.

  • 写回答

2条回答 默认 最新

  • dongxindu8753 2015-04-17 06:38
    关注

    The issue you're encountering is that your input is likely not UTF-8 (which is what bufio and most of the Go language/stdlib expect). Instead, your input probably uses some extended-ASCII codepage, which is why the unaccented characters are passing through cleanly (UTF-8 is also a superset of 7-bit ASCII), but that the 'Ñ' is not passed through intact.

    In this situation, the bit-representation of the accented character is not valid UTF-8, so the unicode replacement character (U+FFFD) is being produced. You've got a few options:

    1. Convert your input files to UTF-8 before passing them to Go. There are many utilities that can do this, and editors often have this feature.
    2. Try using golang.org/x/text/encoding/charmap together with NewReader from golang.org/x/text/transform to transform your input to UTF-8. Pass the resulting Reader to bufio.NewScanner
    3. Change the line in the loop to os.Stdout.Write(scanner.Bytes()); fmt.Println(); This might avoid the bytes being interpreted as UTF-8 beyond newline splitting. Writing the bytes directly to os.Stdout will further avoid any (mis)interpretation of the contents.
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥50 求解vmware的网络模式问题
  • ¥24 EFS加密后,在同一台电脑解密出错,证书界面找不到对应指纹的证书,未备份证书,求在原电脑解密的方法,可行即采纳
  • ¥15 springboot 3.0 实现Security 6.x版本集成
  • ¥15 PHP-8.1 镜像无法用dockerfile里的CMD命令启动 只能进入容器启动,如何解决?(操作系统-ubuntu)
  • ¥30 请帮我解决一下下面六个代码
  • ¥15 关于资源监视工具的e-care有知道的嘛
  • ¥35 MIMO天线稀疏阵列排布问题
  • ¥60 用visual studio编写程序,利用间接平差求解水准网
  • ¥15 Llama如何调用shell或者Python
  • ¥20 谁能帮我挨个解读这个php语言编的代码什么意思?