字符串切片是否执行基础数据的复制？

I am trying to efficiently count runes from a utf-8 string using the utf8 library. Is this example optimal in that it does not copy the underlying data?
https://golang.org/pkg/unicode/utf8/#example_DecodeRuneInString

func main() {
    str := "Hello, 世界" // let's assume a runtime-provided string
    for len(str) > 0 {
        r, size := utf8.DecodeRuneInString(str)
        fmt.Printf("%c %v
", r, size)
        str = str[size:] // performs copy?
    }
}

I found StringHeader in the (unsafe) reflect library. Is this the exact structure of a string in Go? If so, it is conceivable that slicing a string merely updates Data or allocates a new StringHeader altogether.

type StringHeader struct {
        Data uintptr
        Len  int
}

Bonus: where can I find the code that performs string slicing so that I could look it up myself? Any of these?
https://golang.org/src/runtime/slice.go
https://golang.org/src/runtime/string.go

This related SO answer suggests that runtime-strings incur a copy when converted from string to []byte.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
douxia6554 2018-09-18 23:55
关注
Slicing Strings

does slice of string perform copy of underlying data?

No it does not. See this post by Russ Cox:

A string is represented in memory as a 2-word structure containing a pointer to the string data and a length. Because the string is immutable, it is safe for multiple strings to share the same storage, so slicing s results in a new 2-word structure with a potentially different pointer and length that still refers to the same byte sequence. This means that slicing can be done without allocation or copying, making string slices as efficient as passing around explicit indexes.

-- Go Data Structures

Slices, Performance, and Iterating Over Runes

A slice is basically three things: a length, a capacity, and a pointer to a location in an underlying array.

As such, slices themselves are not very large: ints and a pointer (possibly some other small things in implementation detail). So the allocation required to make a copy of a slice is very small, and doesn't depend on the size of the underlying array. And no new allocation is required when you simply update the length, capacity, and pointer location, such as on line 2 of:

foo := []int{3, 4, 5, 6} foo = foo[1:]

Rather, it's when a new underlying array has to be allocated that a performance impact is felt.

Strings in Go are immutable. So to change a string you need to make a new string. However, strings are closely related to byte slices, e.g. you can create a byte slice from a string with

foo := `here's my string` fooBytes := []byte(foo)

I believe that will allocate a new array of bytes, because:

a string is in effect a read-only slice of bytes

according to the Go Blog (see Strings, bytes, runes and characters in Go). In general you can use a slice to change the contents of an underlying array, so to produce a usable byte slice from a string you would have to make a copy to keep the user from changing what's supposed to be immutable.

You could use performance profiling and benchmarking to gain further insight into the performance of your program.

Once you have your slice of bytes, fooBytes, reslicing it does not allocate a new array, it just allocates a new slice, which is small. This appears to be what slicing a string does as well.

Note that you don't need to use the utf8 package to count words in a utf8 string, though you may proceed that way if you like. Go handles utf8 natively. However if you want to iterate over characters you can't represent the string as a slice of bytes, because you could have multibyte characters. Instead you need to represent it as a slice of runes:

foo := `here's my string` fooRunes := []rune(foo)

This operation of converting a string to a slice of runes is fast in my experience (trivial in benchmarks I've done, but there may be an allocation). Now you can iterate across fooRunes to count words, no utf8 package required. Alternatively, you can skip the explicit []rune(foo) conversion and do it implicitly by using a for ... range loop on the string, because those are special:

A for range loop, by contrast, decodes one UTF-8-encoded rune on each iteration. Each time around the loop, the index of the loop is the starting position of the current rune, measured in bytes, and the code point is its value.

-- Strings, bytes, runes and characters in Go
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

关于python中以字符串切片的方法反转字符串原理？ python
2020-01-28 15:44

回答 1 已采纳 step = 循环前一次的下标 - 后一次的下标比如 step = 1的时候 0 1 2 3 4... (1-0=1 2-1=1 3-2=1...) step=-1的时候 9 8 7 6 5.
为什么字符串切片的时候没有作用？ python
2022-08-13 10:13

回答 2 已采纳是不是没有导入datetime import datetime t = '09:41:58.3' time1 = 2 mydate1 = datetime.datetime.strptime(t
为什么对时间字符串切片的时候没有作用？ python
2022-08-13 10:10

回答 1 已采纳因为时间戳需要日期才行，光有小时，他也没日期，没法处理，时间戳是从1970年开始的
golang 字符串切片是否包含某个元素_Golang细节拾遗
2020-11-29 20:05

王端端的博客作者：Masamune1、如果s是非ASCII编码...2、高效统计unicode字符串内字符数：utf8.RuneCountInString(s)。3、拼接字符串更好的办法是使用函数strings.Join()，甚至使用字节缓冲bytes.Buffer。4、通过函数len()来获取...
怎么给字符串切片和编号？ list python
2022-11-17 23:48

回答 3 已采纳 z = """ 排序算法一：冒泡排序排序算法二：快速排序排序算法三：选择排序 """ # 切片，字符串中每一个都是用空格分开的，所以可以用空格切片，切片后将空白字符（换行，空格之类的）去掉 it
Python有关字符串切片 python
2022-03-23 09:53

回答 2 已采纳 import re s = "LanQiaoBei" res = re.split('(?=[A-Z])', s) print(res)
如何将切片与字符串进行比较？
2018-06-12 13:05

回答 1 已采纳 Perhaps, something like this: //receive msg msg := make([]byte, 1024) for { n, useraddr, err
python字符串切片函数_python – 字符串切片的时间复杂度
2020-12-03 14:02

weixin_39608748的博客简短回答：str slice,一般来说,复制,所以你正在做O(n2)工作.也就是说,如果您可以使用memoryviews to get zero-copy views of the ...答案很长：(C)Python str不会通过引用数据子集的视图进行切片. str切片有三种...
如何将定界字符串解析为子字符串切片？
2015-01-18 16:29

回答 2 已采纳 I think your URL should be http://127.0.0.1:3001/find?fields=hostname&fields=App&fields=Node_type
python字符串切片问题-递归 python
2022-04-28 17:08

回答 1 已采纳因为你这个return是return给了上一次调用函数的语句并不是直接返回到第一次调用trim的时候
Python里的字符串切片问题 python
2022-03-20 14:59

回答 2 已采纳如果是倒着切片前面的起止位置要互换print(str[-2:10:-3])
Python 字符串：Python 中的字符串切片
2024-01-30 05:00

新华的博客在需要的地方理解和应用切片，因为这种优雅的语法会产生干净和高质量的代码。具有一个参数的实现将“stop”索引作为唯一且...中的字符串切片就是通过从“开始”索引到“停止”索引切片来从给定字符串中获取子字符串。
字符串切片，哪里有错误 list python
2019-12-09 12:00

回答 1 已采纳说明你这个划分出来有中文，中文转int错误 ``` invalid literal for int() with base 10: '阴\n10' ``` ("C:/Users/jyz_1/
详解Python核心对象类型字符串
2020-09-20 19:01

利用切片操作可以方便地实现字符串的复制，因为分片操作会创建一个新的字符串对象。在字符串转换方面，Python提供了多种内置函数来实现不同数据类型与字符串之间的转换。例如，int()函数可以将字符串转换为整数，...
Python语法基础（字符串、列表、切片和元组）
2022-01-28 17:44

Azad221103的博客文章目录一、变量和简单数据类型 1.1 第一个程序 1.2 变量和字符串 1.2.1 变量 1.2.2 字符串 1.3 数字和注释二、列表简介 2.1 创建列表 2.2 修改、添加和删除元素 2.2.1 修改和添加元素 2.2.2 删除元素 2.3 组织...
没有解决我的问题, 去提问

悬赏问题

¥15 pnpm 下载element-plus
¥15 解决编写PyDracula时遇到的问题
¥15 有没有人能解决下这个问题吗，本人不会编程
¥15 plotBAPC画图出错
¥30 关于#opencv#的问题：使用大疆无人机拍摄水稻田间图像，拼接成tif图片，用什么方法可以识别并框选出水稻作物行
¥15 Python卡尔曼滤波融合
¥20 iOS绕地区网络检测
¥15 python验证码滑块图像识别
¥15 根据背景及设计要求撰写设计报告
¥20 能提供一下思路或者代码吗

字符串切片是否执行基础数据的复制？

1条回答 默认 最新

Slicing Strings

Slices, Performance, and Iterating Over Runes

悬赏问题

1条回答默认最新