从io.Reader读取UTF-8编码的字符串

I am writing an small communication protocol with TCP sockets. I am able to read and write basic data types such as integers but I have no idea of how to read an UTF-8 encoded string from a slice of bytes.

The protocol client is written in Java and the server is Go.

As per I read: GO runes are 32 bit long and UTF-8 chars are 1 to 4 byte long, what makes not possible to simply cast a byte slice to a String.

I'd like to know how can I read and write this UTF-8 stream.

Note I have the byte buffer length on time to read the string.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
doutuo7815 2013-11-25 22:06
关注
Some theory first:

A rune in Go represents a Unicode code point — a number assigned to a particular character in Unicode. It's an alias to uint32.

UTF-8 is a Unicode encoding — a format of representing Unicode code points for the means of storage and transmission. UTF-8 might use 1 to 4 bytes to encode a single code point.

How this maps on Go data types:

Both []byte and string store a series of bytes (a byte in Go is an alias for uint8).

The chief difference is that strings are immutable, so while you can

b := make([]byte, 2) b[0] = byte('a') b[1] = byte('z')

you can't

var s string s[0] = byte('a')

The latter fact is even underlined by the inability to set the string length explicitly (like in imaginary s := make(string, 10)).

While strings in Go contain abstract bytes (you're free to store in them, say, characters encoded using Windows-1252), certain Go statements and type conversions interpret strings as being encoded in UTF-8, in particular:

A type conversion between string and []rune parses the string as a sequence of UTF-8-encoded code points and produces a slice of them. The reverse type conversion takes the Unicode code points from the slice of runes and produces an UTF-8-encoded string.

A range loop over a string loops through Unicode code points comprising the string, not just bytes.

Go also supplies the type conversions between string and []byte and back. Now recall that strings are read-only, while slices of bytes are not. This means a construct like

b := make([]byte, 1000) io.ReadFull(r, b) s := sting(b)

always copies the data, no matter if you convert a slice to a string or back. This wastes space but is type-safe and enforces the semantics.

Now back to your task at hand.

If you work with reasonably small strings and are not under memory pressure, just convert your byte slices filled by io.Read() (or whatever) to strings. Be sure to reuse the slice you're using to read the data to ease the pressure on the garbage collector — that is, do not allocate a new slice for each new read as you're gonna to copy the data put to it by the reading code off to a string.

Finally, if you absolutely must to not copy the data (say, you're dealing with multi-megabyte strings, and you have tight memory requirements), you may try to play dirty tricks by unsafely working with memory — here is an example of how you might transplant the memory from a byte slice to a string. Note that should you revert to something like this, you must very well understand that it's free to break with any new release of Go, and it's not even guaranteed to work at all.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

从io.Reader读取UTF-8编码的字符串
2013-11-25 14:07

回答 1 已采纳 Some theory first: A rune in Go represents a Unicode code point — a number assigned to a particu
如何使用ReadLine（）获取字符串输入？
2019-09-05 11:42

回答 2 已采纳 Get rid of the scanner (you already said you prefer ReadLine()) and change your nextInt() function
创建流以在GO中读取巨大的字符串 xml
2017-09-04 16:41

回答 1 已采纳 Turn any string into an io.Reader with the strings.NewReader method: reader := strings.NewReader(
java读取utf8_使用Java从文件读取UTF8数据
2021-03-03 11:58

CodeWizardess的博客通常，数据以位(1或0)的形式存储在计算机中。有多种可用的编码方案来指定每个字符代表的字节集。...UTF-8-它以8位为单位(字节)，UTF8中的字符长度可以从1到4个字节，从而使UTF8的宽度可变。UTF-...
sharding-jdbc，mybatis，查询一个json字段的数据，但是奇怪的是报String不能转成clob？ java 后端数据库
2022-05-11 10:05

回答 1 已采纳 mybatis的类型转换一般是BaseTypeHandler的子类，自己去debug吧，如果只是string类型，自己指定对应的typeHandle
串口转网络调试组手能正常接受到返回数据，但是java的socket获取不到返回值 java 其他网络
2023-01-13 23:41

回答 2 已采纳 client.shutdownOutput();是关闭服务器连接，关闭连接后在通过输入流读取in.read();就会造成题主说的"获取不到Socket的返回值"的问题。按照如下代码块修改即可： /
用java实现：上传excl表格。读取数据，输出结果。 java
2018-01-11 08:49

回答 7 已采纳 443193862@qq.com,楼主把你的代码、部署环境描述发过来，给你贴代码估计你系统也跑不起来，我可以给你做个简单上传demo，你自己根据业务再扩展
golang io.Reader和io.Writer
2022-05-27 17:01

charlie_wang007的博客 type Reader interface { Read(p []byte) (n int, err error) } type Writer interface { Write(p []byte) (n int, err error) } 实现reader和writer的模块： ...strings.Reader: 把字符串抽象成Rea
如何将下面这段java代码的生成签名部分用PHP重写？ php
2022-10-19 18:06

回答 2 已采纳 <?php $data = "this is a test"; $private_key = <<<EOD -----BEGIN PRIVATE KEY----- MI
为啥说是java的非法表达啊？ jar sql struts
2021-04-26 18:26

回答 5 已采纳加了分号也没用啊
Java编码的相关问题
2011-08-17 22:38

回答 3 已采纳这是由于properties文件中存的不是中文，就是英文字符“\u5206\u5272”
io.Reader 解析
2018-12-14 09:52

cbmljs的博客 io.Reader 是一个 Interface 类型，功能非常强大，在任何需要读的地方我们都尽量使用它。先来看下它的原型： type Reader interface { Read(p []byte) (n int, err error) } 可见，任何实现了 Read() 函数的...
dom4j增加已存在的节点，怎么增加，我的一直报错
2009-04-14 11:40

回答 1 已采纳 [code="java"] public boolean addCallBoardInfo(String name,String content){ Element in
[Go]将string转换为io.Reader类型
2021-02-08 15:31

程序员老狼的博客在使用很多函数的时候需要传入string字符串 , 但是函数参数类型是io.Reader , 这时候就需要将string转换为Reader类型例如下面的: strings.NewReader("aaaa") NewReader返回从读取的新Reader。它类似于bytes....
JAVA输出带BOM的UTF-8编码的文件
2019-05-28 12:54

玉标的博客当从http 的response输出CSV文件的时候，设置为utf8的时候默认...bom的，但是windows的Excel是使用bom来确认utf8编码的，所有需要把bom写到文件的开头。微软在 UTF-8 中使用 BOM 是因为这样可以把 UTF-8 和 ASCI...
没有解决我的问题, 去提问

悬赏问题

¥15 ETLCloud 处理json多层级问题
¥15 matlab中使用gurobi时报错
¥15 这个主板怎么能扩出一两个sata口
¥15 不是，这到底错哪儿了😭
¥15 2020长安杯与连接网探
¥15 关于#matlab#的问题：在模糊控制器中选出线路信息，在simulink中根据线路信息生成速度时间目标曲线（初速度为20m/s，15秒后减为0的速度时间图像）我想问线路信息是什么
¥15 banner广告展示设置多少时间不怎么会消耗用户价值
¥15 可见光定位matlab仿真
¥15 arduino 四自由度机械臂
¥15 wordpress 产品图片 GIF 没法显示

从io.Reader读取UTF-8编码的字符串

1条回答 默认 最新

悬赏问题

1条回答默认最新