donglefu6195 2017-01-01 19:13

# Golang：为什么使用goroutines并行化调用会导致速度变慢？

I have two versions of merge sort implementation. The first is a "normal" version and the second uses `goroutines` that parallelize the work being done on each subset of the slice in each step of the recursion.

One would assume that being able to parallelize this work would make the concurrent implementation faster: if I need to work on slice A and slice B, then working on them concurrently should be faster than doing so synchronously.

Now I'm assuming something is wrong with either my implementation of my understanding, because my concurrent version ends up being 13-14x slower than the sync version.

Can anyone point me in the right direction as to what I'm missing?

"Normal" (synchronous implementation):

``````// MergeSort sorts the slice s using Merge Sort Algorithm
func MergeSort(s []int) []int {
if len(s) <= 1 {
return s
}

n := len(s) / 2

var l []int
var r []int

l = MergeSort(s[:n])
r = MergeSort(s[n:])

return merge(l, r)
}
``````

"Concurrent" version:

``````// MergeSortMulti sorts the slice s using Merge Sort Algorithm
func MergeSortMulti(s []int) []int {
if len(s) <= 1 {
return s
}

n := len(s) / 2

wg := sync.WaitGroup{}
wg.Add(2)

var l []int
var r []int

go func() {
l = MergeSortMulti(s[:n])
wg.Done()
}()

go func() {
r = MergeSortMulti(s[n:])
wg.Done()
}()

wg.Wait()
return merge(l, r)
}
``````

Both use the same `merge` function:

``````func merge(l, r []int) []int {
ret := make([]int, 0, len(l)+len(r))
for len(l) > 0 || len(r) > 0 {
if len(l) == 0 {
return append(ret, r...)
}
if len(r) == 0 {
return append(ret, l...)
}
if l[0] <= r[0] {
ret = append(ret, l[0])
l = l[1:]
} else {
ret = append(ret, r[0])
r = r[1:]
}
}
return ret
}
``````

This is my benchmarking code:

``````package msort

import "testing"

var a []int

func init() {
for i := 0; i < 1000000; i++ {
a = append(a, i)
}
}
func BenchmarkMergeSortMulti(b *testing.B) {
for n := 0; n < b.N; n++ {
MergeSortMulti(a)
}
}

func BenchmarkMergeSort(b *testing.B) {
for n := 0; n < b.N; n++ {
MergeSort(a)
}
}
``````

It reveals that the concurrent version is a lot slower than the normal synchronous version:

``````BenchmarkMergeSortMulti-8              1    1711428093 ns/op
BenchmarkMergeSort-8                  10     131232885 ns/op
``````
• 写回答

#### 2条回答默认 最新

• dongxiongshi9952 2017-01-01 20:00
关注

This is because you spawn tons of goroutines which preempt when calling `wg.Wait()`. Scheduler has no idea which one to pick, it can pick randomly blocked ones till it meets one that finally can return and unblock another one. When I limited number of concurrent calls to MergeSortMulti it became roughly 3x faster than synchronous version.

This code isn't beautiful, but it is a proof.

``````// MergeSortMulti sorts the slice s using Merge Sort Algorithm
func MergeSortMulti(s []int) []int {
if len(s) <= 1 {
return s
}

n := len(s) / 2

wg := sync.WaitGroup{}
wg.Add(2)

var l []int
var r []int

const N = len(s)
const FACTOR = 8  // ugly but easy way to limit number of goroutines

go func() {
if n < N/FACTOR {
l = MergeSort(s[:n])
} else {
l = MergeSortMulti(s[:n])
}
wg.Done()
}()

go func() {
if n < N/FACTOR {
r = MergeSort(s[n:])
} else {
r = MergeSortMulti(s[n:])
}
wg.Done()
}()

wg.Wait()
return merge(l, r)
}
``````

Results will be different on your machine, but:

FACTOR = 4:

``````BenchmarkMergeSortMulti-8             50          33268370 ns/op
BenchmarkMergeSort-8                  20          91479573 ns/op
``````

FACTOR = 10000

``````BenchmarkMergeSortMulti-8             20          84822824 ns/op
BenchmarkMergeSort-8                  20         103068609 ns/op
``````

FACTOR = N/4

``````BenchmarkMergeSortMulti-8              3         352870855 ns/op
BenchmarkMergeSort-8                  10         129107177 ns/op
``````

Bonus: You can do also use semaphore to limit the number of goroutines which is a bit slower on my machine (select is used to avoid dead-lock):

``````var sem = make(chan struct{}, 100)

// MergeSortMulti sorts the slice s using Merge Sort Algorithm
func MergeSortMulti(s []int) []int {
if len(s) <= 1 {
return s
}

n := len(s) / 2

wg := sync.WaitGroup{}
wg.Add(2)

var l []int
var r []int

select {
case sem <- struct{}{}:
go func() {
l = MergeSortMulti(s[:n])
<-sem
wg.Done()
}()
default:
l = MergeSort(s[:n])
wg.Done()
}

select {
case sem <- struct{}{}:
go func() {
r = MergeSortMulti(s[n:])
<-sem
wg.Done()
}()
default:
r = MergeSort(s[n:])
wg.Done()
}

wg.Wait()
return merge(l, r)
}
``````

It yields:

``````BenchmarkMergeSortMulti-8             30          36741152 ns/op
BenchmarkMergeSort-8                  20          90843812 ns/op
``````
本回答被题主选为最佳回答 , 对您是否有帮助呢?
评论

#### 悬赏问题

• ¥15 相同型号电脑与配置，发现主板有一台貌似缺少了好多元器件似的，会影响稳定性和使用寿命吗？
• ¥15 要求编写稀疏矩阵A的转置矩阵的算法
• ¥15 编写满足以下要求的停车场管理程序，设停车场只有一个可停放n辆车的狭窄通道且只有一个大门可供车辆进出。
• ¥20 powerbulider 导入excel文件，显示不完整
• ¥15 用keil调试程序保证结果进行led相关闪烁
• ¥15 paddle训练自己的数据loss降不下去
• ¥20 用matlab的pdetool解决以下三个问题
• ¥15 单个福来轮的平衡与侧向滑动是如何做到的？
• ¥15 嵌入式Linux固件，能直接告诉我crc32校验的区域在哪不，内核的校验我已经找到了，uboot没有
• ¥20 h3c静态路要求有详细过程