I have two versions of merge sort implementation. The first is a "normal" version and the second uses goroutines
that parallelize the work being done on each subset of the slice in each step of the recursion.
One would assume that being able to parallelize this work would make the concurrent implementation faster: if I need to work on slice A and slice B, then working on them concurrently should be faster than doing so synchronously.
Now I'm assuming something is wrong with either my implementation of my understanding, because my concurrent version ends up being 13-14x slower than the sync version.
Can anyone point me in the right direction as to what I'm missing?
"Normal" (synchronous implementation):
// MergeSort sorts the slice s using Merge Sort Algorithm
func MergeSort(s []int) []int {
if len(s) <= 1 {
return s
}
n := len(s) / 2
var l []int
var r []int
l = MergeSort(s[:n])
r = MergeSort(s[n:])
return merge(l, r)
}
"Concurrent" version:
// MergeSortMulti sorts the slice s using Merge Sort Algorithm
func MergeSortMulti(s []int) []int {
if len(s) <= 1 {
return s
}
n := len(s) / 2
wg := sync.WaitGroup{}
wg.Add(2)
var l []int
var r []int
go func() {
l = MergeSortMulti(s[:n])
wg.Done()
}()
go func() {
r = MergeSortMulti(s[n:])
wg.Done()
}()
wg.Wait()
return merge(l, r)
}
Both use the same merge
function:
func merge(l, r []int) []int {
ret := make([]int, 0, len(l)+len(r))
for len(l) > 0 || len(r) > 0 {
if len(l) == 0 {
return append(ret, r...)
}
if len(r) == 0 {
return append(ret, l...)
}
if l[0] <= r[0] {
ret = append(ret, l[0])
l = l[1:]
} else {
ret = append(ret, r[0])
r = r[1:]
}
}
return ret
}
This is my benchmarking code:
package msort
import "testing"
var a []int
func init() {
for i := 0; i < 1000000; i++ {
a = append(a, i)
}
}
func BenchmarkMergeSortMulti(b *testing.B) {
for n := 0; n < b.N; n++ {
MergeSortMulti(a)
}
}
func BenchmarkMergeSort(b *testing.B) {
for n := 0; n < b.N; n++ {
MergeSort(a)
}
}
It reveals that the concurrent version is a lot slower than the normal synchronous version:
BenchmarkMergeSortMulti-8 1 1711428093 ns/op
BenchmarkMergeSort-8 10 131232885 ns/op