There are invalid byte sequences that can't be converted to Unicode strings. How do I detect that when converting []byte
to string
in Go?
如何在Go中检测何时无法将字节转换为字符串?
- 写回答
- 好问题 0 提建议
- 追加酬金
- 关注问题
- 邀请回答
-
1条回答 默认 最新
- dpbf62565 2016-01-18 20:08关注
You can, as Tim Cooper noted, test UTF-8 validity with
utf8.Valid
.But! You might be thinking that converting non-UTF-8 bytes to a Go
string
is impossible. In fact, "In Go, a string is in effect a read-only slice of bytes"; it can contain bytes that aren't valid UTF-8 which you can print, access via indexing, or even round-trip back to a[]byte
(toWrite
, say).There are two places in the language that Go does do UTF-8 decoding of
string
s for you.- when you do
for i, r := range s
ther
is a Unicode code point as a value of typerune
- when you do the conversion
[]rune(s)
, Go decodes the whole string to runes
In both these instances invalid UTF-8 is replaced with
U+FFFD
, the replacement character reserved for uses like this. More is in the spec sections onfor
statements and conversions betweenstring
s and other types. These conversions never crash, so you only need to actively check for UTF-8 validity if it's relevant to your application, like if you want to throw an error on mis-encoded input.Since that behavior's baked into the language, you can expect it from libraries, too.
U+FFFD
isutf8.RuneError
and returned by functions inutf8
.Here's a sample program showing what Go does with a
[]byte
holding invalid UTF-8:package main import "fmt" func main() { a := []byte{0xff} s := string(a) fmt.Println(s) for _, r := range s { fmt.Println(r) } rs := []rune(s) fmt.Println(rs) }
Output will look different in different environments, but in the Playground it looks like
� 65533 [65533]
本回答被题主选为最佳回答 , 对您是否有帮助呢?解决 无用评论 打赏 举报 - when you do
悬赏问题
- ¥15 乌班图ip地址配置及远程SSH
- ¥15 怎么让点阵屏显示静态爱心,用keiluVision5写出让点阵屏显示静态爱心的代码,越快越好
- ¥15 PSPICE制作一个加法器
- ¥15 javaweb项目无法正常跳转
- ¥15 VMBox虚拟机无法访问
- ¥15 skd显示找不到头文件
- ¥15 机器视觉中图片中长度与真实长度的关系
- ¥15 fastreport table 怎么只让每页的最下面和最顶部有横线
- ¥15 java 的protected权限 ,问题在注释里
- ¥15 这个是哪里有问题啊?