dongxiangchan0743 2018-11-19 13:58
浏览 13
已采纳

上帝曾经的类型的效率测量

I have a piece of code that I want to run only once for initialization. So far I was using sync.Mutex combined with an if-clause to test if it has been run already. Later I came across the Once type and its DO() function in the same sync package.

The implementation is the following https://golang.org/src/sync/once.go:

func (o *Once) Do(f func()) {
    if atomic.LoadUint32(&o.done) == 1 {
        return
    }
    // Slow-path.
    o.m.Lock()
    defer o.m.Unlock()
    if o.done == 0 {
        defer atomic.StoreUint32(&o.done, 1)
        f()
    }
}

Looking at the code, it is basically the same thing I've been using before. A mutex combined with an if-clause. However, the added function calls makes this seem rather inefficient to me. I did some testing and tried varous versions:

func test1() {
    o.Do(func() {
        // Do smth
    })
    wg.Done()
}

func test2() {
    m.Lock()
    if !b {
        func() {
            // Do smth
        }()
    }
    b = true
    m.Unlock()
    wg.Done()
}

func test3() {
    if !b {
        m.Lock()
        if !b {
            func() {
                // Do smth
            }()
            b = true
        }
        m.Unlock()
    }
    wg.Done()
}

I tested all versions by running the following code:

    wg.Add(10000)
    start = time.Now()
    for i := 0; i < 10000; i++ {
        go testX()
    }
    wg.Wait()
    end = time.Now()

    fmt.Printf("elapsed: %v
", end.Sub(start).Nanoseconds())

with the following resutls:

elapsed: 8002700 //test1
elapsed: 5961600 //test2
elapsed: 5646700 //test3

Is it even worth using the Once type? It is convenient but performance is even worse than test2 which always serializes all routines.

Also, why are they using an atomic int for their if-clause? Storing happens inside the lock anyway.

Edit: Go playground link: https://play.golang.org/p/qlMxPYop7kS NOTICE: this doensn't show the results as time is fixed in the playground.

  • 写回答

1条回答 默认 最新

  • dpjs2005 2018-11-19 14:18
    关注

    That is not how you're supposed to test code performance. You should use Go's built-in testing framework (testing package and go test command). See Order of the code and performance for details.

    Let's create the testable code:

    func f() {
        // Code that must only be run once
    }
    
    var testOnce = &sync.Once{}
    
    func DoWithOnce() {
        testOnce.Do(f)
    }
    
    var (
        mu = &sync.Mutex{}
        b  bool
    )
    
    func DoWithMutex() {
        mu.Lock()
        if !b {
            f()
            b = true
        }
        mu.Unlock()
    }
    

    Let's write proper testing / benchmarking code using the testing package:

    func BenchmarkOnce(b *testing.B) {
        for i := 0; i < b.N; i++ {
            DoWithOnce()
        }
    }
    
    func BenchmarkMutex(b *testing.B) {
        for i := 0; i < b.N; i++ {
            DoWithMutex()
        }
    }
    

    We can run the benchmark with the following code:

    go test -bench .
    

    And here are the benchmarking results:

    BenchmarkOnce-4         200000000                6.30 ns/op
    BenchmarkMutex-4        100000000               20.0 ns/op
    PASS
    

    As you can see, using sync.Once() was almost 4 times faster than using a sync.Mutex. Why? Because sync.Once() has an "optimized", short path that uses only an atomic load to check if the task has been called before, and if so, no mutex is used. The "slow" path is likely only used once, on first call to Once.Do(). Although if you'd have many concurrent goroutines attempting to call DoWithOnce(), the slow path might be reached multiple times, but on the long run once.Do() will only need to use an atomic load.

    Parallel testing (from multiple goroutines)

    Yes, the above benchmarking code only uses a single goroutine to test. But using multiple concurrent goroutines will just make the mutex's case worse, as it always have to obtain a mutex to even check if the task is to be called while sync.Once just uses an atomic load.

    Nevertheless, let's benchmark it.

    Here are the benchmarking code using parallel testing:

    func BenchmarkOnceParallel(b *testing.B) {
        b.RunParallel(func(pb *testing.PB) {
            for pb.Next() {
                DoWithOnce()
            }
        })
    }
    
    func BenchmarkMutexParallel(b *testing.B) {
        b.RunParallel(func(pb *testing.PB) {
            for pb.Next() {
                DoWithMutex()
            }
        })
    }
    

    I have 4 cores on my machine, so I'm gonna use those 4 cores:

    go test -bench Parallel -cpu=4
    

    (You may omit the -cpu flag in which case it defaults to GOMAXPROCS–the number of cores available.)

    And here are the results:

    BenchmarkOnceParallel-4         500000000                3.04 ns/op
    BenchmarkMutexParallel-4        20000000                93.7 ns/op
    

    When "concurrency increases", the results are starting to become uncomparable in favor of sync.Once (in the above test, it's 30 times faster).

    We may further increase the number of goroutines created using testing.B.SetPralleism(), but I got similar result when I set it to 100 (that means 400 goroutines were used to call the benchmarking code).

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 BP神经网络控制倒立摆
  • ¥20 要这个数学建模编程的代码 并且能完整允许出来结果 完整的过程和数据的结果
  • ¥15 html5+css和javascript有人可以帮吗?图片要怎么插入代码里面啊
  • ¥30 Unity接入微信SDK 无法开启摄像头
  • ¥20 有偿 写代码 要用特定的软件anaconda 里的jvpyter 用python3写
  • ¥20 cad图纸,chx-3六轴码垛机器人
  • ¥15 移动摄像头专网需要解vlan
  • ¥20 access多表提取相同字段数据并合并
  • ¥20 基于MSP430f5529的MPU6050驱动,求出欧拉角
  • ¥20 Java-Oj-桌布的计算