donglefu6195 2017-01-01 11:13

已采纳

Golang：为什么使用goroutines并行化调用会导致速度变慢？

I have two versions of merge sort implementation. The first is a "normal" version and the second uses goroutines that parallelize the work being done on each subset of the slice in each step of the recursion.

One would assume that being able to parallelize this work would make the concurrent implementation faster: if I need to work on slice A and slice B, then working on them concurrently should be faster than doing so synchronously.

Now I'm assuming something is wrong with either my implementation of my understanding, because my concurrent version ends up being 13-14x slower than the sync version.

Can anyone point me in the right direction as to what I'm missing?

"Normal" (synchronous implementation):

// MergeSort sorts the slice s using Merge Sort Algorithm
func MergeSort(s []int) []int {
    if len(s) <= 1 {
        return s
    }

    n := len(s) / 2

    var l []int
    var r []int

    l = MergeSort(s[:n])
    r = MergeSort(s[n:])

    return merge(l, r)
}

"Concurrent" version:

// MergeSortMulti sorts the slice s using Merge Sort Algorithm
func MergeSortMulti(s []int) []int {
    if len(s) <= 1 {
        return s
    }

    n := len(s) / 2

    wg := sync.WaitGroup{}
    wg.Add(2)

    var l []int
    var r []int

    go func() {
        l = MergeSortMulti(s[:n])
        wg.Done()
    }()

    go func() {
        r = MergeSortMulti(s[n:])
        wg.Done()
    }()

    wg.Wait()
    return merge(l, r)
}

Both use the same merge function:

func merge(l, r []int) []int {
    ret := make([]int, 0, len(l)+len(r))
    for len(l) > 0 || len(r) > 0 {
        if len(l) == 0 {
            return append(ret, r...)
        }
        if len(r) == 0 {
            return append(ret, l...)
        }
        if l[0] <= r[0] {
            ret = append(ret, l[0])
            l = l[1:]
        } else {
            ret = append(ret, r[0])
            r = r[1:]
        }
    }
    return ret
}

This is my benchmarking code:

package msort

import "testing"

var a []int

func init() {
    for i := 0; i < 1000000; i++ {
        a = append(a, i)
    }
}
func BenchmarkMergeSortMulti(b *testing.B) {
    for n := 0; n < b.N; n++ {
        MergeSortMulti(a)
    }
}

func BenchmarkMergeSort(b *testing.B) {
    for n := 0; n < b.N; n++ {
        MergeSort(a)
    }
}

It reveals that the concurrent version is a lot slower than the normal synchronous version:

BenchmarkMergeSortMulti-8              1    1711428093 ns/op
BenchmarkMergeSort-8                  10     131232885 ns/op

展开全部

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

dongxiongshi9952 2017-01-01 12:00

关注

This is because you spawn tons of goroutines which preempt when calling wg.Wait(). Scheduler has no idea which one to pick, it can pick randomly blocked ones till it meets one that finally can return and unblock another one. When I limited number of concurrent calls to MergeSortMulti it became roughly 3x faster than synchronous version.

This code isn't beautiful, but it is a proof.

// MergeSortMulti sorts the slice s using Merge Sort Algorithm
func MergeSortMulti(s []int) []int {
    if len(s) <= 1 {
        return s
    }

    n := len(s) / 2

    wg := sync.WaitGroup{}
    wg.Add(2)

    var l []int
    var r []int

    const N = len(s)
    const FACTOR = 8  // ugly but easy way to limit number of goroutines

    go func() {
        if n < N/FACTOR {
            l = MergeSort(s[:n])
        } else {
            l = MergeSortMulti(s[:n])
        }
        wg.Done()
    }()

    go func() {
        if n < N/FACTOR {
            r = MergeSort(s[n:])
        } else {
            r = MergeSortMulti(s[n:])
        }
        wg.Done()
    }()

    wg.Wait()
    return merge(l, r)
}

Results will be different on your machine, but:

FACTOR = 4:

BenchmarkMergeSortMulti-8             50          33268370 ns/op
BenchmarkMergeSort-8                  20          91479573 ns/op

FACTOR = 10000

BenchmarkMergeSortMulti-8             20          84822824 ns/op
BenchmarkMergeSort-8                  20         103068609 ns/op

FACTOR = N/4

BenchmarkMergeSortMulti-8              3         352870855 ns/op
BenchmarkMergeSort-8                  10         129107177 ns/op

Bonus: You can do also use semaphore to limit the number of goroutines which is a bit slower on my machine (select is used to avoid dead-lock):

var sem = make(chan struct{}, 100)

// MergeSortMulti sorts the slice s using Merge Sort Algorithm
func MergeSortMulti(s []int) []int {
    if len(s) <= 1 {
        return s
    }

    n := len(s) / 2

    wg := sync.WaitGroup{}
    wg.Add(2)

    var l []int
    var r []int

    select {
    case sem <- struct{}{}:
        go func() {
            l = MergeSortMulti(s[:n])
            <-sem
            wg.Done()
        }()
    default:
        l = MergeSort(s[:n])
        wg.Done()
    }

    select {
    case sem <- struct{}{}:
        go func() {
            r = MergeSortMulti(s[n:])
            <-sem
            wg.Done()
        }()
    default:
        r = MergeSort(s[n:])
        wg.Done()
    }

    wg.Wait()
    return merge(l, r)
}

It yields:

BenchmarkMergeSortMulti-8             30          36741152 ns/op
BenchmarkMergeSort-8                  20          90843812 ns/op

展开全部

本回答被题主选为最佳回答 , 对您是否有帮助呢?

查看更多回答(1条)

编辑

预览

报告相同问题？

关注问题

Golang：为什么runtime.GOMAXPROCS限制为256？
2016-12-02 16:46

回答 2 已采纳 Note that, starting the next Go 1.10 (Q1 2018), GOMAXPROCS will be limited by ... nothing. The
Golang：如何将[] string转换为类型化的结构？
2017-05-12 00:12

回答 2 已采纳 You'll need a custom json.Umnarshaller: type MyObj struct { Prop1 int `json:"prop1"`
为什么Golang在goroutines中对闭包的处理方式不同？
2014-09-18 09:48

回答 2 已采纳 Closures in Go are lexically scoped. This means that any variables referenced within the closure f
Golang基础语法
2024-04-18 08:26

qq_45553775的博客 golang又称go language简称golang, go语言是谷歌推出的一中编程语言，可以在不损失应用程序性能的情况下降低代码的复杂性。谷歌首席软件工程师罗勃派克说：‘’我们之所以开发go，是因为过去十年间软件开发的难度令...
Golang：两次重定向并导致http：多次响应。WriteHeader调用
2016-02-08 10:12

回答 2 已采纳 As JimB already pointed out: your server gets confused because there are different status codes as
在Windows上使用JNI从Java调用Golang会导致“动态链接库（DLL）初始化例程失败” java windows
2019-02-18 22:33

回答 1 已采纳 It seems to be related to a bug inside Golang itself. There is currently a Pull Request opened run
为什么Golang中的Network IO会增加线程使用量？
2018-12-30 20:39

回答 1 已采纳 When a thread is blocked on a system IO call, Go may create a new thread to allow other goroutines
golang面试题大全
2024-01-18 01:27

海哥python的博客 Goroutine 线程比标准线程更轻量级，大多数 Golang 程序同时使用数千个 Goroutine。要创建 Goroutine，请 go 在函数声明之前添加关键字。您可以通过向 Goroutine 发送一个信号通道来停止它。Goroutines 只能在被告知...
golang 方法返回的结构体为什么取不到地址? golang 有问必答
2021-09-30 06:48

回答 2 已采纳指针是针对引用取值的，取地址用 &
为什么 Golang 中序列化与反序列的函数一般命名为 marshal 与 Unmarshal 呢？ golang
2019-06-26 11:58

回答 2 已采纳因为 marshal 有组织编排之意，所以可以翻译为：“编排”与“反编排"，就是序列化与反序列化的意思。
Golang：如何使用DES，CBC和PKCS7解密？
2017-01-10 14:10

回答 1 已采纳 Buddy it's work completely fine. package main import ( "bytes" "crypto/des"
golang八股文整理（持续搬运）
2022-04-26 06:33

小张同学该努力了的博客 1.Go语言——垃圾回收 Go V1.3之前的标记-清除： 1.暂停业务逻辑，找到不可达的对象，和可达对象 ...清除数据会产生heap碎片为了减少STW的时间，后来对上述的第三步和第四步进行了替换。 Go V1.5
golang进阶必知问题
2023-08-26 08:32

海豹姥爷的博客总之，GMP 模型是 Go 语言并发编程的核心，通过轻量级的 Goroutines 和通信机制 Channels，以及调度器的协调管理，使得并发编程在 Go 中变得更加简单和高效。在 Go 语言中，"本地队列"和"全局队列"通常指的是调度器...
Coursera上Golang专项课程3：Concurrency in Go 学习笔记（完结）
2024-03-18 04:36

阿正的梦工坊的博客 Golang的goroutine, synchronization
Golang 基础与进阶知识点
2024-05-18 06:05

Lisongxi的博客 Go 语言的 GPM 调度模型是 Go 运行时特有的并发调度模型，用于管理和调度 Goroutines（Go 语言的轻量级线程）。GPM 模型由三部分组成：Goroutine（G）、M（Machine）、和 P（Processor）。实战参考G: 表示 Goroutine...
没有解决我的问题, 去提问

码龄粉丝数原力等级 --

Golang：为什么使用goroutines并行化调用会导致速度变慢？

2条回答默认最新

码龄粉丝数原力等级 --

Golang：为什么使用goroutines并行化调用会导致速度变慢？

2条回答 默认 最新

2条回答默认最新