unicode.RangeTable如何工作？

I'd like some help on understanding the unicode package's RangeTable.

Using this (supposedly helping) function:

func printChars(ranges []unicode.Range16) {
  for _, r := range ranges {

    if r.Hi >= 0x80 { // show only ascii
      break
    }
    fmt.Println("
Lo:", r.Lo, "Hi:", r.Hi, "Stride:", r.Stride)

    for c := r.Lo; c <= r.Hi; c++ {
      fmt.Print(string(c) + " ")
    }
  }
  fmt.Println()
}

For digits, I can do printChars(unicode.Digit.R16), and the sequence of digits make sense to me.

 // Lo: 48 Hi: 57 Stride: 1
 // 0 1 2 3 4 5 6 7 8 9

However, to get punctuation printChars(unicode.Punct.R16) results in

 // Lo: 33 Hi: 35 Stride: 1
 // ! " #
 // Lo: 37 Hi: 42 Stride: 1
 // % & ' ( ) *
 // Lo: 44 Hi: 47 Stride: 1
 //  , - . /
 // Lo: 58 Hi: 59 Stride: 1
 // : ;
 // Lo: 63 Hi: 64 Stride: 1
 // ? @
 // Lo: 91 Hi: 93 Stride: 1
 // [ \ ]
 // Lo: 95 Hi: 123 Stride: 28
 // _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z {

I'm surprised that the lower case letters are included too. Also, what does "Stride" mean? It's 1 for all but the last, but the hi-lo difference varies.

As another example, printChars(unicode.Pe.R16). I thought this should give only the end punctuation:

) right parenthesis (U+0029, Pe)
] right square bracket (U+005D, Pe)
} right curly bracket (U+007D, Pe)

But instead my function prints

 // Lo: 41 Hi: 93 Stride: 52
 // ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ]

Presumably I'm completely misunderstanding the way this is supposed to work.

How might I correctly get a list of characters in a given category, for example, Punctuation End (Pe) as above?

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

dsa456369 2013-11-24 16:12

关注

Stride is the step with which you have to iterate over the range. Let's heighten the boundary of 0x80 a bit and make the loop to iterate using Stride:

package main

import (
    "fmt"
    "unicode"
)

func printChars(ranges []unicode.Range16) {
  for _, r := range ranges {

    if r.Hi >= 0x100 {
      break
    }
    fmt.Println("
Lo:", r.Lo, "Hi:", r.Hi, "Stride:", r.Stride)

    for c := r.Lo; c <= r.Hi; c+=r.Stride {
      fmt.Print(string(c) + " ")
    }
  }
  fmt.Println()
}

func main() {
    printChars(unicode.Punct.R16)
}

And here is the output:

% go run main.go

Lo: 33 Hi: 35 Stride: 1
! " # 
Lo: 37 Hi: 42 Stride: 1
% & ' ( ) * 
Lo: 44 Hi: 47 Stride: 1
, - . / 
Lo: 58 Hi: 59 Stride: 1
: ; 
Lo: 63 Hi: 64 Stride: 1
? @ 
Lo: 91 Hi: 93 Stride: 1
[ \ ] 
Lo: 95 Hi: 123 Stride: 28
_ { 
Lo: 125 Hi: 161 Stride: 36
} ¡ 
Lo: 167 Hi: 171 Stride: 4
§ « 
Lo: 182 Hi: 183 Stride: 1
¶ · 
Lo: 187 Hi: 191 Stride: 4
» ¿

Looks pretty much correct to me.

本回答被题主选为最佳回答 , 对您是否有帮助呢?

查看更多回答(1条)

报告相同问题？

关注问题

unicode.RangeTable如何工作？
2013-11-24 15:01

回答 2 已采纳 Stride is the step with which you have to iterate over the range. Let's heighten the boundary of 0
从符文获取unicode类别
2014-09-11 19:26

回答 2 已采纳 The docs for the "unicode" package does not have a method that returns ranges for the rune but it
使用Go的“测试/快速”包生成数字和字母的随机字符串
2016-07-06 12:10

回答 2 已采纳 Confusingly the Generate interface needs a function using the type not a the pointer to the type.
Go语言 — Unicode码点包
2018-05-09 23:00

li_101357的博客 unicode介绍Unicode只是定义了一个字符和一个编码的映射，但是呢，对应的存储却没有制定。比如一个编码0x0041代表大写字母A，那么可能有一种存储至少有4个字节，那可能0x00000041来存储代表A。这个就是unicode的...
带有由php生成的html表填充数据的Highcharts - 只会加载一个 javascript php
2017-09-21 13:46

回答 2 已采纳 I don't know all you code but I don't see how $option could be equal to 2 different values - opt1
go unicode包
2018-10-07 09:28

Mr_buffoon的博客转载自...------------------------------------------------------------ const ( MaxRune = '\U0010FFFF' // Unicode 码点的最大值 ReplacementChar = '\uFFFD' ...
golang 中unicode包用法
2014-10-20 20:59

msn217的博客 fmt.Printf("%c", unicode.SpecialCase(unicode.CaseRanges).ToUpper(r)) } // HELLO 世界！ } ------------------------------------------------------------ // ToLower 将 r 转换为小写格式 // ...
Golang学习(13)——unicode包
2016-11-28 13:46

weixin_34186931的博客 Golang学习-unicode包------------------------------const (MaxRune = '\U0010FFFF'// Unicode 码点的最大值ReplacementChar = '\uFFFD' // 表示无效的码点MaxASCII = '\u007F' ...
go去掉最后一个字符_Go语言中多字节字符的处理方法详解
2020-12-23 02:42

车干水寿的博客本文内容包括：UTF-8 和 Unicode 的关系，Go语言提供的 unicode 包和 unicode/utf8 包的使用。下面话不多说了，来一起看看详细的介绍吧2 UTF-8 和 Unicode 的关系Unicode一种字符集，是国际标谁化组织(ISO)设计的一...
解决GO语言package golang.org/x/text/unicode/norm: unrecognized import path “golang.org/x/text/unicode/no
2021-01-12 10:49

帅B猪的博客输入 go env -w GO111MODULE=on go env -w GOPROXY=https://goproxy.cn,direct 之后再继续就行
Golang学习 - unicode 包
2013-08-21 22:44

weixin_30672019的博客 ------------------------------------------------------------ ... MaxRune = '\U0010FFFF' // Unicode 码点的最大值 ReplacementChar = '\uFFFD' // 表示无效的码点 MaxASCII = '\u007F' ...
一键解决 go get golang.org/x 包失败
2020-05-01 22:40

504_Ju的博客问题描述当我们使用go get、go install、go mod等命令时，会自动下载相应的包或依赖包。但由于众所周知的原因，类似于golang.org/x/...的包会出现下载失败的情况。如下所示： 1 ...go get golang.org/x/sys: ...
Golang - unicode 包
2017-10-09 15:09

qq_489366879的博客 ------------------------------------------------------------ ... MaxRune = '\U0010FFFF' // Unicode 码点的最大值 ReplacementChar = '\uFFFD' // 表示无效的码点 MaxASCII = '\u007F' // 最大
解决 go get golang.org/x/text 拉取失败问题
2018-09-28 11:26

Madlifejava的博客今天遇导go git golang.org/x/text 报错如下: C:\develop\GitHub\go\project>go get golang.org/x/text package golang.org/x/text: unrecognized import path "golang.org/x/text"...
Go语言中多字节字符的处理
2018-10-26 23:32

红牛编程的博客文章目录1 概述2 `UTF-8` 和 `Unicode` 的关系3 `Unicode` 包Is(rangeTab \*RangeTable, r rune) boolIn(r rune, ranges ...\*RangeTable) boolIsOneOf(ranges []\*RangeTable, r rune) boolIsSpace(r rune) ...
go get无法安装golang.org/x/的解决方法
2019-11-13 15:35

、moddemod的博客写在前面的话因为golang.org/x/服务器在境外，所以正常情况下go get是不能安装的，需要科学上网才可！下面是博主提供的文件，可以先搜索你需要的文件是否存在，存在你再下载！ ...The mirror of golang.org/x ...
4.2 验证表单的输入
2021-01-20 14:19

Kaitiren的博客对于中文我们目前有两种方式来验证，可以使用 unicode 包提供的 func Is(rangeTab *RangeTable, r rune) bool 来验证，也可以使用正则方式来验证，这里使用最简单的正则方式，如下代码所示 if m, _ := regexp....
Golang标准库：unicode包 — Unicode 码点、UTF-8/16 编码
2023-11-11 23:24

学亮编程手记的博客世界中的字符有许许多多，有英文，中文，韩文等。随着全球化进程不断...于是 Unicode 就出现了，它将所有的字符用一个唯一的数字表示。最开始的时候，unicode 认为使用两个字节，也就是 16 位就能包含所有的字符了。
golang-奇淫巧技
2020-11-30 12:26

隔壁有动静的博客 = []*unicode.RangeTable{unicode.Han, unicode.P} // 将 set 设置为“汉字、标点符号” str := "abT97看Wd" for _, v := range str{ if unicode.IsUpper(v){ fmt.Printf("is upper=%c\n",v) // 如果字符是...
go标准包系列-文本包
2023-05-17 18:15

羽辰不是逗比的博客这个包把所有 unicode 涉及到的编码进行了分类，使用结构 type RangeTable struct { R16 []Range16 R32 []Range32 LatinOffset int } func IsControl(r rune) bool // 是否控制字符 func IsDigit(r rune) bool // ...
没有解决我的问题, 去提问

悬赏问题

¥15 matlab不知道怎么改，求解答！！
¥15 永磁直线电机的电流环pi调不出来
¥15 用stata实现聚类的代码
¥15 请问paddlehub能支持移动端开发吗？在Android studio上该如何部署？
¥20 docker里部署springboot项目，访问不到扬声器
¥15 netty整合springboot之后自动重连失效
¥15 悬赏！微信开发者工具报错，求帮改
¥20 wireshark抓不到vlan
¥20 关于#stm32#的问题：需要指导自动酸碱滴定仪的原理图程序代码及仿真
¥20 设计一款异域新娘的视频相亲软件需要哪些技术支持

码龄粉丝数原力等级 --

unicode.RangeTable如何工作？

2条回答默认最新

码龄粉丝数原力等级 --

悬赏问题

unicode.RangeTable如何工作？

2条回答 默认 最新

悬赏问题

2条回答默认最新