dragon012100 2015-07-05 22:25
浏览 93
已采纳

未读取golang unicode / norm迭代器的最后符文

I'm using the golang.org/x/text/unicode/norm package to iterate over runes in a []byte. I've chosen this approach as I need to inspect each rune and maintain information about the sequence of runes. The last call to iter.Next() does not read the last rune. It gives 0 bytes read on the last rune.

Here is the code:

package main

import (
  "fmt"
  "unicode/utf8"

  "golang.org/x/text/unicode/norm"
)

func main() {
  var (
    n   int
    r   rune
    it  norm.Iter
    out []byte
  )
  in := []byte(`test`)
  fmt.Printf("%s
", in)
  fmt.Println(in)
  it.Init(norm.NFD, in)
  for !it.Done() {
    ruf := it.Next()
    r, n = utf8.DecodeRune(ruf)
    fmt.Printf("bytes read: %d. val: %q
", n, r)
    buf := make([]byte, utf8.RuneLen(r))
    utf8.EncodeRune(buf, r)
    out = norm.NFC.Append(out, buf...)
  }
  fmt.Printf("%s
", out)
  fmt.Println(out)
}

This produces the following output:

test
[116 101 115 116]
bytes read: 1. val: 't'
bytes read: 1. val: 'e'
bytes read: 1. val: 's'
bytes read: 0. val: '�'
tes�
[116 101 115 239 191 189]
  • 写回答

1条回答 默认 最新

  • douxian4376 2015-07-06 00:49
    关注

    It is possible this is a bug in golang.org/x/text/unicode/norm and its Init() function.

    In the package's test and example that I see all use InitString. So as a workaround, if you change:

     it.Init(norm.NFD, in)
    

    to:

     it.InitString(norm.NFD, `test`)
    

    things will work as expected.

    I would suggest opening up a bug report, but beware that since this is in the "/x" directory that package is considered experimental by go developers.

    (BTW, I used my the go debugger to help me track down what's going on, but I should say its use was far the kind of debugger I'd like to see.)

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥100 连续两帧图像高速减法
  • ¥15 组策略中的计算机配置策略无法下发
  • ¥15 如何绘制动力学系统的相图
  • ¥15 对接wps接口实现获取元数据
  • ¥20 给自己本科IT专业毕业的妹m找个实习工作
  • ¥15 用友U8:向一个无法连接的网络尝试了一个套接字操作,如何解决?
  • ¥30 我的代码按理说完成了模型的搭建、训练、验证测试等工作(标签-网络|关键词-变化检测)
  • ¥50 mac mini外接显示器 画质字体模糊
  • ¥15 TLS1.2协议通信解密
  • ¥40 图书信息管理系统程序编写