本机上最有效的goroutine数

So I do have a concurrent quicksort implementation written by me. It looks like this:

func Partition(A []int, p int, r int) int {
    index := MedianOf3(A, p, r)
    swapArray(A, index, r)
    x := A[r]
    j := p - 1
    i := p
    for i < r {
        if A[i] <= x {
            j++
            tmp := A[j]
            A[j] = A[i]
            A[i] = tmp
        }
        i++
    }
    swapArray(A, j+1, r)
    return j + 1
}


func ConcurrentQuicksort(A []int, p int, r int) {
    wg := sync.WaitGroup{}
    if p < r {
        q := Partition(A, p, r)
        select {
        case sem <- true:
            wg.Add(1)
            go func() {
                ConcurrentQuicksort(A, p, q-1)
                <-sem
                wg.Done()
            }()
        default:
            Quicksort(A, p, q-1)
        }
        select {
        case sem <- true:
            wg.Add(1)
            go func() {
                ConcurrentQuicksort(A, q+1, r)
                <-sem
                wg.Done()
            }()
        default:
            Quicksort(A, q+1, r)
        }
    }
    wg.Wait()
}

func Quicksort(A []int, p int, r int) {
    if p < r {
        q := Partition(A, p, r)
        Quicksort(A, p, q-1)
        Quicksort(A, q+1, r)
    }
}

I have a sem buffered channel, which I use to limit the number of goroutines running (if its reaches that number, I dont set up another goroutine, I just do the normal quicksort on the subarray). First I started with 100, then I've changed to 50, 20. The benchmarks would get slightly better. But after switching to 10, it started to go back, times started to get bigger. So there is some arbitrary number, at least for my hardware, that makes the algorithm run most efficient.

When I was implementing this, I actually saw some SO question about the number of goroutines that would be the best and now I cannot find it (stupid Chrome history actually saves not all visited sites). Do you know how to calculate such a things? And it would be the best if I didn't have to hardcode it, just let the program do it itself.

P.S I have nonconcurrent Quicksort, which runs about 1.7x slower than this. As you can see in my code, I do Quicksort, when the number of running goroutines exceeds the number I've set up earlier. I thought what about using a ConcurrentQuicksort, but not calling it with go keyword, just simply calling it, and maybe if other goroutines finish their job, the ConcurrentQuicksort which I called would start to launch up goroutines, speeding up the process (cuz as you can see Quicksort would only launch recursive quicksorts, without goroutines). I did that, and actually the time was like 10% slower than the regular Quicksort. Do you know why would that happen?

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
doukongpao0903 2017-06-27 03:19
关注
You have to experiment a bit with this stuff, but I don't think the main concern is goroutines running at once. As the answer @reticentroot linked to says, it's not necessarily a problem to run a lot of simultaneous goroutines.

I think your main concern should be total number of goroutine launches. The current implementation could theoretically start a goroutine to sort just a few items, and that goroutine would spend a lot more time on startup/coordination than actual sorting.

The ideal is you only start as many goroutines as you need to get good utilization of all your CPUs. If your work items are ~equal size and your cores are ~equally busy, then starting one task per core is perfect.

Here, tasks aren't evenly sized, so you might split the sort into somewhat more tasks than you have CPUs and distribute them. (In production you would typically use a worker pool to distribute work without starting a new goroutine for every task, but I think we can get away with skipping that here.)

To get a workable number of tasks--enough to keep all cores busy, but not so many that you create lots of overhead--you can set a minimum size (initial array size/100 or whatever), and only split off sorts of arrays larger than that.

In slightly more detail, there is a bit of cost every time you send a task off to the background. For starters:

Each goroutine launch spends a little time setting up the stack and doing scheduler bookkeeping

Each task switch spends some time in the scheduler and may incur cache misses when the two goroutines are looking at different code or data

Your own coordination code (channel sends and sync ops) takes time

Other things can prevent ideal speedups from happening: you could hit a systemwide limit on e.g. memory bandwidth as Volker pointed out, some sync costs can increase as you add cores, and you can run into various trickier issues sometimes. But the setup, switching, and coordination costs are a good place to start.

The benefit that can outweigh the coordination costs is, of course, other CPUs getting work done when they'd otherwise sit idle.

I think, but haven't tested, that your problems at 50 goroutines are 1) you already reached nearly-full utilization long ago, so adding more tasks adds more coordination work without making things go faster, and 2) you're creating goroutines for tiny sorts, which may spend more of their time setting up and coordinating than they actually do sorting. And at 10 goroutines your problem might be that you're no longer achieving full CPU utilization.

If you wanted, you could test those theories by counting the number of total goroutine launches at various goroutine limits (in an atomic global counter) and measuring CPU utilization at various limits (e.g. by running your program under the Linux/UNIX time utility).

The approach I'd suggest for a divide-and-conquer problem like this is only fork off a goroutine for large enough subproblems (for quicksort, that means large enough subarrays). You can try different limits: maybe you only start goroutines for pieces that are more than 1/64th of the original array, or pieces above some static threshold like 1000 items.

And you meant this sort routine as an exercise, I suspect, but there are various things you can do to make your sorts faster or more robust against weird inputs. The standard libary sort falls back to insertion sort for small subarrays and uses heapsort for the unusual data patterns that cause quicksort problems.

You can also look at other algorithms like radix sort for all or part of the sorting, which I played with. That sorting library is also parallel. I wound up using a minimum cutoff of 127 items before I'd hand a subarray off for other goroutines to sort, and I used an arrangement with a fixed pool of goroutines and a buffered chan to pass tasks between them. That produced decent practical speedups at the time, though it was likely not the best approach at the time and I'm almost sure it's not on today's Go scheduler. Experimentation is fun!
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

本机上最有效的goroutine数
2017-06-27 01:34

回答 1 已采纳 You have to experiment a bit with this stuff, but I don't think the main concern is goroutines run
Goroutine超时
2018-07-07 12:07

回答 2 已采纳 You control cancelation of http requests with a context.Context. // create a timeout or cancelati
在golang中优先使用goroutine
2018-12-21 20:09

回答 2 已采纳 I have created threadpools on golang. This should allow easily one to prioritize certain goroutine
Go 协程（goroutine）调度原理
2022-12-12 23:20

试剑江湖。的博客 Goroutine调度是一个很复杂的机制，尽管Go源码中提供了大量的注释，但对其原理没有一个好的理解的情况下去读源码收获不会...下面尝试用简单的语言描述一下Goroutine调度机制，在此基础上再去研读源码效果可能更好一些。
如何等待其他多个Goroutine的单个Goroutine响应？
2019-02-24 16:50

回答 1 已采纳 What I would to to solve your task is I would use a goroutine pool for this. There would be a prod
Webhook进程在另一个goroutine上运行 http
2016-06-13 04:58

回答 2 已采纳 Serving each http request runs in its own goroutine (more details on this). You are allowed to sta
Goroutine和互斥锁
2018-06-19 19:36

回答 1 已采纳 We'll call the initial goroutine that's running when start is entered G1. start (in G1) locks th
golang goroutine实现_Golang 多goroutine异步通知error的一种方法
2021-01-14 07:52

weixin_39517199的博客作者近期在写一个项目时遇到了这样的需求：调用一个库API函数，函数内部又会拉起若干个后台goroutine。这时后台goroutine如果遇到错误想要及时通知库的使用者将不会是一件容易的事情，因为这是一个异步通知error的...
Goroutine循环未完成
2018-10-12 05:15

回答 1 已采纳 Finally figured the answer... The problem was that I needed to close my monitoringChan in the fir
Goroutine在Windows和Linux上的行为有所不同
2016-10-09 16:27

回答 2 已采纳 That's probably a race condition. Import "time", put this line after go feedChan(), and see if it
从调用另一个goroutine的goroutine返回
2017-10-27 17:17

回答 1 已采纳 You can make an experiment on excellent https://play.golang.org playground! I recommend experiment
goroutine、channel以及GMP模型的原理深度解析【万字分析】
2023-06-18 19:15

UPUP小亮的博客 goroutine、channel以及GMP模型是学习golang绕不开的部分，之前学习golang的时候对这一块的理解不够深入，本文将深度分析并且总结他们的底层原理。GMP 模型是指在 Go 语言中的三个核心组件：Goroutine、Machine和 ...
使用for循环遍历通道时获取Goroutine死锁
2019-06-24 20:07

回答 2 已采纳 When the goroutines are done, close the channel to indicate that no more values will be added. The
Goroutine 并发调度模型深度解析之手撸一个高性能 goroutine 池
2021-02-23 17:07

Geffin的博客文章目录1 前言2 Goroutine & Scheduler2.1 线程那些事儿2.1.1 用户级线程模型2.1.2 内核级线程模型2.1.3 两级线程模型2.2 G-P-M 模型概述2.3 G-P-M 模型调度2.3.1 用户态阻塞/唤醒2.3.2 系统调用阻塞3 大规模 ...
理解Go语言中上下文
2024-04-09 23:24

Mindfulness code的博客理解Go语言中的上下文
没有解决我的问题, 去提问

悬赏问题

¥15 #MATLAB仿真#车辆换道路径规划
¥15 java 操作 elasticsearch 8.1 实现索引的重建
¥15 数据可视化Python
¥15 要给毕业设计添加扫码登录的功能！！有偿
¥15 kafka 分区副本增加会导致消息丢失或者不可用吗？
¥15 微信公众号自制会员卡没有收款渠道啊
¥100 Jenkins自动化部署—悬赏100元
¥15 关于#python#的问题：求帮写python代码
¥20 MATLAB画图图形出现上下震荡的线条
¥15 关于#windows#的问题：怎么用WIN 11系统的电脑克隆WIN NT3.51-4.0系统的硬盘

本机上最有效的goroutine数

1条回答 默认 最新

悬赏问题

1条回答默认最新