为什么在多goroutine中读写切片会花费很多呢？

I have written a multi-goroutine version's mergeSort by go, and I also wrote a benchmark test. Now I want to use "go tool pprof" to analyze the bottleneck of my code.

When I got the cpu profile, I use "top10" in pprof to get the following output:

Showing nodes accounting for 4.73s, 98.54% of 4.80s total
Dropped 21 nodes (cum <= 0.02s)
Showing top 10 nodes out of 30
      flat  flat%   sum%        cum   cum%
     3.66s 76.25% 76.25%      3.66s 76.25%  pingcap/talentplan/tidb/common/alg/sort.Merge
     0.62s 12.92% 89.17%      0.64s 13.33%  pingcap/talentplan/tidb/mergesort.prepare
     0.17s  3.54% 92.71%      0.17s  3.54%  runtime.freedefer
     0.12s  2.50% 95.21%      0.14s  2.92%  pingcap/talentplan/tidb/common/alg/sort.quickSort
     0.10s  2.08% 97.29%      0.10s  2.08%  runtime.memmove
     0.03s  0.62% 97.92%      0.03s  0.62%  runtime.memclrNoHeapPointers
     0.03s  0.62% 98.54%      0.04s  0.83%  runtime.stackpoolalloc
         0     0% 98.54%      0.11s  2.29%  pingcap/talentplan/tidb/common/alg/sort.MergeSortByMultiGoroutine
         0     0% 98.54%      0.14s  2.92%  pingcap/talentplan/tidb/common/alg/sort.QuickSort
         0     0% 98.54%      4.04s 84.17%  pingcap/talentplan/tidb/common/alg/sort.mergeSortByMultiGoroutine

From the above, I think that the curious bottleneck is in sort.Merge, so I use "list Merge" to dive into the method, and I find the following info:

     .          .     50:func Merge(arr []int64, start int, mid int, end int, tmp []int64)  {
     .          .     51:   index, i, j := start, start, mid + 1
  80ms       80ms     52:   for ; i <= mid && j <= end; index++ {
 180ms      180ms     53:           if arr[i] <= arr[j] {
 1.58s      1.58s     54:                   tmp[index] = arr[i]
  50ms       50ms     55:                   i++
     .          .     56:           } else {
 1.52s      1.52s     57:                   tmp[index] = arr[j]
  20ms       20ms     58:                   j++
     .          .     59:           }
     .          .     60:   }
     .          .     61:   for ; i <= mid; index++ {
     .          .     62:           tmp[index] = arr[i]
     .          .     63:           i++
     .          .     64:   }
     .          .     65:   for ; j <= end; index++ {
     .          .     66:           tmp[index] = arr[j]
     .          .     67:           j++
     .          .     68:   }
     .          .     69:
  60ms       60ms     70:   for i := start; i <= end; i++ {
 170ms      170ms     71:           arr[i] = tmp[i]
     .          .     72:   }
     .          .     73:}

What confuse me is here! In the Merge method, there has 4 for-loop. The 1st for-loop and the 4th for-loop's scale is approximately the same, and both of them's task are moving elements from a slice to another slice. And the problem is that why the 1st for-loop cost so much(1.58s plus 1.52s), but the 4th for-loop cost so few(just 170ms)? It's counter-intuitive!

This project's github address is https://github.com/Duncan15/talent-plan/tree/master/tidb/mergesort. You can use "make pprof" to run the benchmark test and get the cpu profile and memory profile.

I want to know why this happen, if you have time, please read my code and give me some advise.

thank you for telling me!!!

I have write some code to verify that when the Merge method run in single-goroutine environment the 1st for-loop's cost is approximately the same with the 4th-for-loop, this seems intuitive. So I think whether the multi-goroutine environment cause the above phenomenon. But in multi-goroutine environment, the Merge method run concurrently. In other word, the 1st for-loop and 4th for-loop run concurrently, if read and write slice concurrently would increase the cost, the 4th for-loop's cost must also increase, but from the pprof output we can find that only the 1st for-loop's cost increase!

And I also write another test to verify my thought. You can use "make vet" to run it. this test run the Merge method concurrently. The difference with the multi-goroutine version's mergeSort is that this test has no code about sorting, just merely code about merging. And I surprisingly find that in this test the 1st for-loop's cost is approximately the same with the 4th for-loop's. So finally I am confused completely!

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

报告相同问题？

关注问题

xlwings读写多个excel为什么时快时慢? python
2018-11-05 09:53

回答 3 已采纳 1.试下在save和kill之间加上延时 2.将kill 换成 quit()试试
为什么硬盘的写入速度能差那么多？ windows
2022-02-14 18:07

回答 2 已采纳当写入大量小文件时，系统需要为每个小文件创建目录结构，分配存储空间，然后再写入文件内容，如果系统安装了杀毒软件，有可能杀毒软件还要扫描刚刚创建的文件，等等，由于这一系列因素，写入大量小文件速度就慢下来
在实际的工作环境中，什么情况下设置只读、读写、读写改及完全控制权限? windows
2021-08-28 16:12

回答 2 已采纳举个简单例子：一家公司有3个部门，人事，技术，和销售有一个內网网盘，开3个资料夹，HR,IT,MKT给各部门放东西例如各部门领导，可以完全控制自己的资料夹各部门初级员工只可以读取自己部门资料夹人事部的
使用goroutine减慢你的代码
2022-07-01 16:59

big-john的博客无脑使用goroutine都能让程序运行速度变快吗
java读写锁多线程并发问题？ java 有问必答
2021-11-20 21:10

回答 3 已采纳在put方法中，是不能读的。单在执行put方法之间的空隙时间，是可以读的
实例锁不是只有一把吗？为什么读写锁模式可以多线程同时调用含有synchronized的read方法？ java spring 后端
2022-01-24 14:12

回答 2 已采纳这个 synchronized 不是给读写加锁，而是给获取读写锁前的操作加锁，获取到后 synchronized 就已经释放了，读写本身是没有锁的
请问为什么我这个样子他不输出年龄呢？ c++ 开发语言
2022-11-09 09:59

回答 2 已采纳你不输出年龄是因为输出姓名后报错了，string getName() 这个方法需要返回值，你注释了return，当然不行 #include <iostream> #include &
GoLang学习笔记之进阶编程（二）：Goroutine与Channel
2021-12-11 17:19

Allen-LuLu的博客在我刚开始学习Go语言的时候，就了解到Go语言的优势，即天生适合高...那么GO语言为什么会适合高并发呢，还有GO语言的并发模式通常是怎么样的，本章节将介绍GO语言的轻量级协程Goroutine以及协程间的通信媒介Channel。
mfc多线程一次读写文件 mfc
2015-07-18 16:17

回答 5 已采纳使用同步机制进程中线程同步的四种常用方式： 1、临界区（CCriticalSection）当多个线程访问一个独占性共享资源时，可以使用临界区对象。拥有临界区的线程可以访问被保护起来的资源或
如何在golang中的单个TCP连接上同时进行读写？
2018-03-10 06:10

回答 1 已采纳 func testC(cc Cc, st string) actually takes a copy of cc. So, even with locking on a mutex, you ac
在MCU中DMA的传输速度受什么影响？ arm mcu
2022-09-06 11:25

回答 2 已采纳仅供参考： PCB 的设计对 SDRAM 读写稳定影响特别大。我经手过的一些板子由于 PCB 设计的原因 SDRAM 都需要降频使用才行。
Golang 应用性能优化实践
2023-07-28 00:56

禅与计算机程序设计艺术的博客 Golang语言是Google公司在2007年推出的高级编程语言。近几年来，Golang语言逐渐得到了越来越多企业应用。而其应用于服务器端、移动端、云计算等领域都受到了广泛关注。近年来，Golang语言也越来越火爆，成为热门的...
java多线程的读写锁问题 java
2015-04-12 03:37

回答 2 已采纳从原理上来说，你这打印的情况是存在的 ``` public double getPrice1(){ double value; lock.readLock().lock();//加锁 t
Go并发编程系列（八）互斥锁, 读写锁, 条件变量, Waitgroup, Once, 临时对象池Pool和原子操作
2021-02-14 09:05

张柏沛的博客 Go并发编程系列（一）多进程编程与进程同步之Pipe管道 Go并发编程系列（二）多进程编程与进程同步之Signal信号量 Go并发编程系列（三）多进程编程与进程同步之Socket编程...
go基础+面试题(持续更新中...)
2022-02-20 16:32

亭外桥边的博客 go语言中没有隐藏的this指针，这句话是什么意思？go语言中的引用类型包含哪些？go语言中指针运算有哪些？import 导包指针deferDefer作用defer的执行顺序defer规则defer 实现原理defer的创建和执行defe
没有解决我的问题, 去提问

悬赏问题

¥20 wireshark抓不到vlan
¥20 关于#stm32#的问题：需要指导自动酸碱滴定仪的原理图程序代码及仿真
¥20 设计一款异域新娘的视频相亲软件需要哪些技术支持
¥15 stata安慰剂检验作图但是真实值不出现在图上
¥15 c程序不知道为什么得不到结果
¥40 复杂的限制性的商函数处理
¥15 程序不包含适用于入口点的静态Main方法
¥15 素材场景中光线烘焙后灯光失效
¥15 请教一下各位，为什么我这个没有实现模拟点击
¥15 执行 virtuoso 命令后，界面没有，cadence 启动不起来

码龄粉丝数原力等级 --

为什么在多goroutine中读写切片会花费很多呢？

0条回答默认最新

悬赏问题

为什么在多goroutine中读写切片会花费很多呢？

0条回答 默认 最新

悬赏问题

0条回答默认最新