drdu53813 2015-03-20 16:29
浏览 69
已采纳

在并行quicksort实现中使用go例程时,性能较差

Note: The "Go-lang parallel segment runs slower than series segment" question dealt with race conditions, this one has another issue, so imho it's not a duplicate.

I'm trying to find an explanation for the following situation: Running parallel quicksort results in a significantly longer runtime when done using go routines.

Benchmarks are after the code:

package c9sort

import (
    "time"
)

var runInParllel bool

func Quicksort(nums []int, parallel bool) ([]int, int) {
    started := time.Now()
    ch := make(chan int)
    runInParllel = parallel

    go quicksort(nums, ch)

    sorted := make([]int, len(nums))
    i := 0
    for next := range ch {
        sorted[i] = next
        i++
    }
    return sorted, int(time.Since(started).Nanoseconds() / 1000000)
}

func quicksort(nums []int, ch chan int) {

    // Choose first number as pivot
    pivot := nums[0]

    // Prepare secondary slices
    smallerThanPivot := make([]int, 0)
    largerThanPivot := make([]int, 0)

    // Slice except pivot
    nums = nums[1:]

    // Go over slice and sort
    for _, i := range nums {
        switch {
        case i <= pivot:
            smallerThanPivot = append(smallerThanPivot, i)
        case i > pivot:
            largerThanPivot = append(largerThanPivot, i)
        }
    }

    var ch1 chan int
    var ch2 chan int

    // Now do the same for the two slices
    if len(smallerThanPivot) > 1 {
        ch1 = make(chan int, len(smallerThanPivot))
        if runInParllel {
            go quicksort(smallerThanPivot, ch1)
        } else {
            quicksort(smallerThanPivot, ch1)
        }
    }
    if len(largerThanPivot) > 1 {
        ch2 = make(chan int, len(largerThanPivot))
        if runInParllel {
            go quicksort(largerThanPivot, ch2)
        } else {
            quicksort(largerThanPivot, ch2)
        }
    }

    // Wait until the sorting finishes for the smaller slice
    if len(smallerThanPivot) > 1 {
        for i := range ch1 {
            ch <- i
        }
    } else if len(smallerThanPivot) == 1 {
        ch <- smallerThanPivot[0]
    }
    ch <- pivot

    if len(largerThanPivot) > 1 {
        for i := range ch2 {
            ch <- i
        }
    } else if len(largerThanPivot) == 1 {
        ch <- largerThanPivot[0]
    }

    close(ch)
}

Benchmarks for a random perm of 500000 integers:

Ran 100 times

Non parallel average - 1866ms

Parallel average - 2437ms

Any explanation would be appreciated. I know goroutines may not be best for this kind of parallelism, but I'm trying to understand the reason.

Thank you in advance.

  • 写回答

2条回答 默认 最新

  • dougang1967 2015-03-21 05:05
    关注

    Turns out it was very simple. As I'm on a new machine, the GOMAXPROCS variable wasn't set.

    The new benchmark favors, as predicted, the parallel implementation: Set to double the number of cores:

    Using 16 goroutines
    Ran 100 times
    
    Non parallel average - 1980
    Parallel average - 1133
    

    Set to the number of cores:

    Using 8 goroutines
    Ran 100 times
    
    Non parallel average - 2004
    Parallel average - 1197
    

    By the way, this is fairly consistent. The average for double the number of cores is always a bit better.

    Benchmark for a larger collection (1000000):

    Using 8 goroutines
    Ran 100 times
    
    Non parallel average - 3748
    Parallel average - 2265
    

    With double:

    Using 16 goroutines
    Ran 100 times
    
    Non parallel average - 3817
    Parallel average - 2012
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥20 阿里云python代码求解
  • ¥20 有人知道这种图怎么画吗?
  • ¥15 pyqt6如何引用qrc文件加载里面的的资源
  • ¥15 安卓JNI项目使用lua上的问题
  • ¥20 RL+GNN解决人员排班问题时梯度消失
  • ¥15 统计大规模图中的完全子图问题
  • ¥15 使用LM2596制作降压电路,一个能运行,一个不能
  • ¥60 要数控稳压电源测试数据
  • ¥15 能帮我写下这个编程吗
  • ¥15 ikuai客户端l2tp协议链接报终止15信号和无法将p.p.p6转换为我的l2tp线路