角色可以在Go中跨越多个符文吗？

I read this on this blog

Even with rune slices a single character might span multiple runes, which can happen if you have characters with grave accent, for example. This complicated and ambiguous nature of "characters" is the reason why Go strings are represented as byte sequences.

Is it true ? (it seems like a blog from someone who knows Go). I tested on my machine and "è" is 1 rune and 2 bytes. And the Go doc seems to say otherwise.

Have you encountered such characters ? (utf-8) Can a character span multiple runes in Go ?

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dtdt0454 2016-04-12 09:29
关注
Yes it can:

s := "é́́" fmt.Println(s, []rune(s))

Output (try it on the Go Playground):

é́́ [101 769 769 769]

One character, 4 runes. It may be arbitrary long...

Example taken from The Go Blog: Text Normalization in Go.

What is a character?

As was mentioned in the strings blog post, characters can span multiple runes. For example, an 'e' and '◌́' (acute "\u0301") can combine to form 'é' ("e\u0301" in NFD). Together these two runes are one character. The definition of a character may vary depending on the application. For normalization we will define it as a sequence of runes that starts with a starter, a rune that does not modify or combine backwards with any other rune, followed by possibly empty sequence of non-starters, that is, runes that do (typically accents). The normalization algorithm processes one character at at time.

A character can be followed by any number of modifiers (modifiers can be repeated and stacked):

Theoretically, there is no bound to the number of runes that can make up a Unicode character. In fact, there are no restrictions on the number of modifiers that can follow a character and a modifier may be repeated, or stacked. Ever seen an 'e' with three acutes? Here you go: 'é́́'. That is a perfectly valid 4-rune character according to the standard.

Also see: Combining character.

Edit: "Doesn't this kill the 'concept of runes'?"

Answer: It's not a concept of runes. A rune is not a character. A rune is an integer value identifying a Unicode code point. A character may be one Unicode code point in which case 1 character is 1 rune. Most of the general use of runes fits into this case, so in practice this hardly gives any headaches. It's a concept of the Unicode standard.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

检查符文是否在基本多语言平面中的正确方法是什么？
2019-08-14 06:50

回答 2 已采纳 Basic Multilingual Plane have the following code point ranges allocated: 0000–0FFF 8000–8FFF
如何通过符文数组在GO中拆分字符串？
2014-03-03 19:16

回答 1 已采纳 For example, package main import ( "fmt" "strings" ) func split(s string, separators []
如何在Go中获取符文的十进制值？
2014-10-07 17:18

回答 1 已采纳 Range over the string to get the numeric values of the runes. func escape(s string) string { va
go var 一个整数_Go 语言的基本数据类型
2020-12-22 00:28

weixin_39688870的博客 Go 语言的基本数据类型0)变量声明var 变量名字类型 = 表达式例：var num int = 10其中“类型”或“= 表达式”两个部分可以省略其中的一个。1)根据初始化表达式来推导类型信息2)默认值初始化为0。例：var num int //...
如何在类型转换中区分符文和int32值？
2017-03-14 18:51

回答 1 已采纳 It's an alias for int32, apparently you can't distinguish them. If you really needed to, defining
Golang符文字符串或如何转换？
2017-12-29 09:33

回答 1 已采纳 Go accepts hexadecimal rune literals. So you can use your input as a regular string: fmt.Println
如何从golang中的一串符文中获取子字符串？
2015-02-25 12:08

回答 3 已采纳 Just convert it to a slice of runes first, slice, then convert the result back: string([]rune(str
Go 语言基础
2024-06-05 20:36

Mindfulness code的博客 Go 语言基础知识讲解
在Golang的Scanln中使用符文而不是字符串
2016-10-16 19:45

回答 1 已采纳 What you have is a slice not an array, they're different in Go When you read each character from
如何使用go在符文中找到字符串的偏移量索引
2017-01-31 11:26

回答 1 已采纳 Edit #2: You again indicated a new type "meaning" of your question: you want to search a string in
如何在Go中用符文遍历字符串？
2013-08-08 16:08

回答 3 已采纳 See this example from Effective Go : for pos, char := range "日本語" { fmt.Printf("character %c
Go语言 2 类型
2024-07-25 17:29

南行*的博客 go语言基础语法
Go字符串【Go语言圣经笔记】
2021-07-04 21:19

从流域到海域的博客内置的len函数可以返回一个字符串总的byte数（而不是rune字符个数）(笔者注：如果想要取字符个数，需使用len([]rune(str))，这里面的差别在于byte对应uint8，而rune对应int32)，索引操作s[i]返回第i个字节
《Go语言圣经》学习笔记第三章基础数据类型
2020-04-26 19:35

Lumos`的博客 Go语言圣经学习笔记第三章基础数据类型目录整型浮点数复数布尔型字符串常量注：学习《Go语言圣经》笔记，PDF点击下载，建议看书。 Go语言小白学习笔记，书上的内容照搬，大佬看了勿喷，以后熟悉了会...
Go源码学习：bufio包 - 1.1 - bufio.go -（2）
2023-12-27 12:59

风不归Alkaid的博客 Go：bufio：Read、ReadByte、UnreadByte、ReadRune、UnreadRune、ReadSlice、ReadLine、collectFragments、ReadBytes、ReadString
go-09-基本数据类型-字符，字符串和布尔值
2022-09-12 20:23

shark_西瓜甜的博客 Go 中，独立的字母、数字和符号被统称为字符。通过拼接多个字符并使用双引号包裹起来，就得到了字符串字面量，在 Go 中使用string表示。
Go语言圣经阅读-第八周
2021-01-10 11:39

shao1013的博客内置的len函数可以返回一个字符串中的字节数目（不是rune字符数目），索引操作s[i]返回第i个字节的字节值，i必须满足0 ≤ i< len(s)条件约束。 s := "hello, world" fmt.Println(len(s)) // "12
Go 语言的基本数据类型
2016-09-08 18:38

weixin_34242509的博客 Go 语言的基本数据类型 0)变量声明 var 变量名字类型 = 表达式例： var num int = 10 其中“类型”或“= 表达式”两个部分可以省略其中的一个。 1）根据初始化表达式来推导类型信息 2）默认值初始...
Go语言的基本数据类型
2018-05-19 12:43

weixin_34075551的博客第i个字节并不一定是字符串的第i个字符，因为对于非ASCII字符的UTF8编码会要两个或多个字节。子字符串操作s[i:j]基于原始的s字符串的第i个字节开始到第j个字节（并不包含j本身）生成一个新字符串。生成的新...
【翻译】.NET 5中的性能改进
2021-03-23 20:21

深蓝旭的博客【翻译】.NET 5中的性能改进在.NET Core之前的版本中，其实已经在博客中介绍了在该版本中发现的重大性能改进。从.NET Core 2.0到.NET Core 2.1到.NET Core 3.0的每一篇... 在这篇文章中，重点介绍约250个PR，这些.
没有解决我的问题, 去提问

悬赏问题

¥15 乌班图ip地址配置及远程SSH
¥15 怎么让点阵屏显示静态爱心，用keiluVision5写出让点阵屏显示静态爱心的代码，越快越好
¥15 PSPICE制作一个加法器
¥15 javaweb项目无法正常跳转
¥15 VMBox虚拟机无法访问
¥15 skd显示找不到头文件
¥15 机器视觉中图片中长度与真实长度的关系
¥15 fastreport table 怎么只让每页的最下面和最顶部有横线
¥15 R语言卸载之后无法重装，显示电脑存在下载某些较大二进制文件行为，怎么办
¥15 java 的protected权限，问题在注释里

角色可以在Go中跨越多个符文吗？

1条回答 默认 最新

悬赏问题

1条回答默认最新