I'd like some help on understanding the unicode package's RangeTable.
Using this (supposedly helping) function:
func printChars(ranges []unicode.Range16) {
for _, r := range ranges {
if r.Hi >= 0x80 { // show only ascii
break
}
fmt.Println("
Lo:", r.Lo, "Hi:", r.Hi, "Stride:", r.Stride)
for c := r.Lo; c <= r.Hi; c++ {
fmt.Print(string(c) + " ")
}
}
fmt.Println()
}
For digits, I can do printChars(unicode.Digit.R16)
, and the sequence of digits make sense to me.
// Lo: 48 Hi: 57 Stride: 1
// 0 1 2 3 4 5 6 7 8 9
However, to get punctuation printChars(unicode.Punct.R16)
results in
// Lo: 33 Hi: 35 Stride: 1
// ! " #
// Lo: 37 Hi: 42 Stride: 1
// % & ' ( ) *
// Lo: 44 Hi: 47 Stride: 1
// , - . /
// Lo: 58 Hi: 59 Stride: 1
// : ;
// Lo: 63 Hi: 64 Stride: 1
// ? @
// Lo: 91 Hi: 93 Stride: 1
// [ \ ]
// Lo: 95 Hi: 123 Stride: 28
// _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z {
I'm surprised that the lower case letters are included too. Also, what does "Stride" mean? It's 1 for all but the last, but the hi-lo difference varies.
As another example, printChars(unicode.Pe.R16)
. I thought this should give only the end punctuation:
- ) right parenthesis (U+0029, Pe)
- ] right square bracket (U+005D, Pe)
- } right curly bracket (U+007D, Pe)
But instead my function prints
// Lo: 41 Hi: 93 Stride: 52
// ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ]
Presumably I'm completely misunderstanding the way this is supposed to work.
How might I correctly get a list of characters in a given category, for example, Punctuation End (Pe) as above?