duandanbeng1829 2017-07-17 14:50
浏览 59
已采纳

函数调用会降低性能

For the following function:

func CycleClock(c *ballclock.Clock) int {
    for i := 0; i < fiveMinutesPerDay; i++ {
        c.TickFive()
    }

    return 1 + CalculateBallCycle(append([]int{}, c.BallQueue...))
}

where c.BallQueue is defined as []int and CalculateBallCycle is defined as func CalculateBallCycle(s []int) int. I am having a huge performance decrease between the for loop and the return statement.

I wrote the following benchmarks to test. The first benchmarks the entire function, the second benchmarks the for loop, while the third benchmarks the CalculateBallCycle function:

func BenchmarkCycleClock(b *testing.B) {
    for i := ballclock.MinBalls; i <= ballclock.MaxBalls; i++ {
        j := i
        b.Run("BallCount="+strconv.Itoa(i), func(b *testing.B) {
            for n := 0; n < b.N; n++ {
                c, _ := ballclock.NewClock(j)

                CycleClock(c)
            }
        })
    }
}

func BenchmarkCycle24(b *testing.B) {
    for i := ballclock.MinBalls; i <= ballclock.MaxBalls; i++ {
        j := i
        b.Run("BallCount="+strconv.Itoa(i), func(b *testing.B) {
            for n := 0; n < b.N; n++ {
                c, _ := ballclock.NewClock(j)

                for k := 0; k < fiveMinutesPerDay; k++ {
                    c.TickFive()
                }
            }
        })
    }
}

func BenchmarkCalculateBallCycle123(b *testing.B) {
    m := []int{8, 62, 42, 87, 108, 35, 17, 6, 22, 75, 116, 112, 39, 119, 52, 60, 30, 88, 56, 36, 38, 26, 51, 31, 55, 120, 33, 99, 111, 24, 45, 21, 23, 34, 43, 41, 67, 65, 66, 85, 82, 89, 9, 25, 109, 47, 40, 0, 83, 46, 73, 13, 12, 63, 15, 90, 121, 2, 69, 53, 28, 72, 97, 3, 4, 94, 106, 61, 96, 18, 80, 74, 44, 84, 107, 98, 93, 103, 5, 91, 32, 76, 20, 68, 81, 95, 29, 27, 86, 104, 7, 64, 113, 78, 105, 58, 118, 117, 50, 70, 10, 101, 110, 19, 1, 115, 102, 71, 79, 57, 77, 122, 48, 114, 54, 37, 59, 49, 100, 11, 14, 92, 16}

    for n := 0; n < b.N; n++ {
        CalculateBallCycle(m)
    }
}

Using 123 balls, this gives the following result:

BenchmarkCycleClock/BallCount=123-8                  200           9254136 ns/op
BenchmarkCycle24/BallCount=123-8                  200000              7610 ns/op
BenchmarkCalculateBallCycle123-8                 3000000               456 ns/op

Looking at this, there is a huge disparity between benchmarks. I would expect that the first benchmark would take roughly ~8000 ns/op since that would be the sum of the parts.

Here is the github repository.

EDIT:

I discovered that the result from the benchmark and the result from the running program are widely different. I took what @yazgazan found and modified the benchmark function in main.go mimic somewhat the BenchmarkCalculateBallCycle123 from main_test.go:

func Benchmark() {
    for i := ballclock.MinBalls; i <= ballclock.MaxBalls; i++ {
        if i != 123 {
            continue
        }

        start := time.Now()

        t := CalculateBallCycle([]int{8, 62, 42, 87, 108, 35, 17, 6, 22, 75, 116, 112, 39, 119, 52, 60, 30, 88, 56, 36, 38, 26, 51, 31, 55, 120, 33, 99, 111, 24, 45, 21, 23, 34, 43, 41, 67, 65, 66, 85, 82, 89, 9, 25, 109, 47, 40, 0, 83, 46, 73, 13, 12, 63, 15, 90, 121, 2, 69, 53, 28, 72, 97, 3, 4, 94, 106, 61, 96, 18, 80, 74, 44, 84, 107, 98, 93, 103, 5, 91, 32, 76, 20, 68, 81, 95, 29, 27, 86, 104, 7, 64, 113, 78, 105, 58, 118, 117, 50, 70, 10, 101, 110, 19, 1, 115, 102, 71, 79, 57, 77, 122, 48, 114, 54, 37, 59, 49, 100, 11, 14, 92, 16})

        duration := time.Since(start)

        fmt.Printf("Ballclock with %v balls took %s;
", i, duration)
    }
}

This gave the output of:

Ballclock with 123 balls took 11.86748ms;

As you can see, the total time was 11.86 ms, all of which was spent in the CalculateBallCycle function. What would cause the benchmark to run in 456 ns/op while the running program runs in around 11867480 ms/op?

  • 写回答

3条回答 默认 最新

  • drfals1307 2017-07-18 09:23
    关注

    You wrote that CalcualteBallCycle() modifies the slice by design.

    I can't speak to correctness of that approach, but it is why benchmark time of BenchmarkCalculateBallCycle123 is so different.

    On first run it does the expected thing but on subsequent runs it does something completely different, because you're passing different data as input.

    Benchmark this modified code:

    func BenchmarkCalculateBallCycle123v2(b *testing.B) {
        m := []int{8, 62, 42, 87, 108, 35, 17, 6, 22, 75, 116, 112, 39, 119, 52, 60, 30, 88, 56, 36, 38, 26, 51, 31, 55, 120, 33, 99, 111, 24, 45, 21, 23, 34, 43, 41, 67, 65, 66, 85, 82, 89, 9, 25, 109, 47, 40, 0, 83, 46, 73, 13, 12, 63, 15, 90, 121, 2, 69, 53, 28, 72, 97, 3, 4, 94, 106, 61, 96, 18, 80, 74, 44, 84, 107, 98, 93, 103, 5, 91, 32, 76, 20, 68, 81, 95, 29, 27, 86, 104, 7, 64, 113, 78, 105, 58, 118, 117, 50, 70, 10, 101, 110, 19, 1, 115, 102, 71, 79, 57, 77, 122, 48, 114, 54, 37, 59, 49, 100, 11, 14, 92, 16}
        for n := 0; n < b.N; n++ {
            tmp := append([]int{}, m...)
            CalculateBallCycle(tmp)
        }
    }
    

    This works-around this behavior by making a copy of m, so that CalculateBallCycle modifies a local copy.

    The running time becomes more like the others:

    BenchmarkCalculateBallCycle123-8         3000000           500 ns/op
    BenchmarkCalculateBallCycle123v2-8           100      10483347 ns/op
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥100 set_link_state
  • ¥15 虚幻5 UE美术毛发渲染
  • ¥15 CVRP 图论 物流运输优化
  • ¥15 Tableau online 嵌入ppt失败
  • ¥100 支付宝网页转账系统不识别账号
  • ¥15 基于单片机的靶位控制系统
  • ¥15 真我手机蓝牙传输进度消息被关闭了,怎么打开?(关键词-消息通知)
  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度