dongyu9667 2015-09-16 06:03
浏览 59
已采纳

当分配更少的线程时,为什么该程序运行得更快?

I have a fairly simple Go program designed to compute random Fibonacci numbers to test some strange behavior I observed in a worker pool I wrote. When I allocate one thread, the program finishes in 1.78s. When I allocate 4, it finishes in 9.88s.

The code is as follows:

var workerWG sync.WaitGroup

func worker(fibNum chan int) {
    for {
        var tgt = <-fibNum
        workerWG.Add(1)
        var a, b float64 = 0, 1
        for i := 0; i < tgt; i++ {
            a, b = a+b, a
        }
        workerWG.Done()
    }
}

func main() {
    rand.Seed(time.Now().UnixNano())
    runtime.GOMAXPROCS(1) // LINE IN QUESTION

    var fibNum = make(chan int)

    for i := 0; i < 4; i++ {
        go worker(fibNum)
    }
    for i := 0; i < 500000; i++ {
        fibNum <- rand.Intn(1000)
    }
    workerWG.Wait()
}

If I replace runtime.GOMAXPROCS(1) with 4, the program takes four times as long to run.

What's going on here? Why does adding more available threads to a worker pool slow the entire pool down?

My personal theory is that it has to do with the processing time of the worker being less than the overhead of thread management, but I'm not sure. My reservation is caused by the following test:

When I replace the worker function with the following code:

for {
    <-fibNum
    time.Sleep(500 * time.Millisecond)
}

both one available thread and four available threads take the same amount of time.

  • 写回答

3条回答 默认 最新

  • douan7601 2015-09-16 09:37
    关注

    I revised your program to look like the following:

    package main
    
    import (
        "math/rand"
        "runtime"
        "sync"
        "time"
    )
    
    var workerWG sync.WaitGroup
    
    func worker(fibNum chan int) {
        for tgt := range fibNum {
            var a, b float64 = 0, 1
            for i := 0; i < tgt; i++ {
                a, b = a+b, a
            }
        }
        workerWG.Done()
    }
    
    func main() {
        rand.Seed(time.Now().UnixNano())
        runtime.GOMAXPROCS(1) // LINE IN QUESTION
    
        var fibNum = make(chan int)
    
        for i := 0; i < 4; i++ {
            go worker(fibNum)
            workerWG.Add(1)
        }
        for i := 0; i < 500000; i++ {
            fibNum <- rand.Intn(100000)
        }
        close(fibNum)
        workerWG.Wait()
    }
    
    • I cleaned up the wait group usage.
    • I changed rand.Intn(1000) to rand.Intn(100000)

    On my machine that produces:

    $ time go run threading.go (GOMAXPROCS=1)
    
    real    0m20.934s
    user    0m20.932s
    sys 0m0.012s
    
    $ time go run threading.go (GOMAXPROCS=8)
    
    real    0m10.634s
    user    0m44.184s
    sys 0m1.928s
    

    This means that in your original code, the work performed vs synchronization (channel read/write) was negligible. The slowdown came from having to synchronize across threads instead of one and only perform a very small amount of work inbetween.

    In essence, synchronization is expensive compared to calculating fibonacci numbers up to 1000. This is why people tend to discourage micro-benchmarks. Upping that number gives a better perspective. But an even better idea is to benchmark actual work being done i.e. including IO, syscalls, processing, crunching, writing output, formatting, etc.

    Edit: As an experiment, I upped the number of workers to 8 with GOMAXPROCS set to 8 and the result was:

    $ time go run threading.go 
    
    real    0m4.971s
    user    0m35.692s
    sys 0m0.044s
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?