函数调用会降低性能

For the following function:

func CycleClock(c *ballclock.Clock) int {
    for i := 0; i < fiveMinutesPerDay; i++ {
        c.TickFive()
    }

    return 1 + CalculateBallCycle(append([]int{}, c.BallQueue...))
}

where c.BallQueue is defined as []int and CalculateBallCycle is defined as func CalculateBallCycle(s []int) int. I am having a huge performance decrease between the for loop and the return statement.

I wrote the following benchmarks to test. The first benchmarks the entire function, the second benchmarks the for loop, while the third benchmarks the CalculateBallCycle function:

func BenchmarkCycleClock(b *testing.B) {
    for i := ballclock.MinBalls; i <= ballclock.MaxBalls; i++ {
        j := i
        b.Run("BallCount="+strconv.Itoa(i), func(b *testing.B) {
            for n := 0; n < b.N; n++ {
                c, _ := ballclock.NewClock(j)

                CycleClock(c)
            }
        })
    }
}

func BenchmarkCycle24(b *testing.B) {
    for i := ballclock.MinBalls; i <= ballclock.MaxBalls; i++ {
        j := i
        b.Run("BallCount="+strconv.Itoa(i), func(b *testing.B) {
            for n := 0; n < b.N; n++ {
                c, _ := ballclock.NewClock(j)

                for k := 0; k < fiveMinutesPerDay; k++ {
                    c.TickFive()
                }
            }
        })
    }
}

func BenchmarkCalculateBallCycle123(b *testing.B) {
    m := []int{8, 62, 42, 87, 108, 35, 17, 6, 22, 75, 116, 112, 39, 119, 52, 60, 30, 88, 56, 36, 38, 26, 51, 31, 55, 120, 33, 99, 111, 24, 45, 21, 23, 34, 43, 41, 67, 65, 66, 85, 82, 89, 9, 25, 109, 47, 40, 0, 83, 46, 73, 13, 12, 63, 15, 90, 121, 2, 69, 53, 28, 72, 97, 3, 4, 94, 106, 61, 96, 18, 80, 74, 44, 84, 107, 98, 93, 103, 5, 91, 32, 76, 20, 68, 81, 95, 29, 27, 86, 104, 7, 64, 113, 78, 105, 58, 118, 117, 50, 70, 10, 101, 110, 19, 1, 115, 102, 71, 79, 57, 77, 122, 48, 114, 54, 37, 59, 49, 100, 11, 14, 92, 16}

    for n := 0; n < b.N; n++ {
        CalculateBallCycle(m)
    }
}

Using 123 balls, this gives the following result:

BenchmarkCycleClock/BallCount=123-8                  200           9254136 ns/op
BenchmarkCycle24/BallCount=123-8                  200000              7610 ns/op
BenchmarkCalculateBallCycle123-8                 3000000               456 ns/op

Looking at this, there is a huge disparity between benchmarks. I would expect that the first benchmark would take roughly ~8000 ns/op since that would be the sum of the parts.

Here is the github repository.

EDIT:

I discovered that the result from the benchmark and the result from the running program are widely different. I took what @yazgazan found and modified the benchmark function in main.go mimic somewhat the BenchmarkCalculateBallCycle123 from main_test.go:

func Benchmark() {
    for i := ballclock.MinBalls; i <= ballclock.MaxBalls; i++ {
        if i != 123 {
            continue
        }

        start := time.Now()

        t := CalculateBallCycle([]int{8, 62, 42, 87, 108, 35, 17, 6, 22, 75, 116, 112, 39, 119, 52, 60, 30, 88, 56, 36, 38, 26, 51, 31, 55, 120, 33, 99, 111, 24, 45, 21, 23, 34, 43, 41, 67, 65, 66, 85, 82, 89, 9, 25, 109, 47, 40, 0, 83, 46, 73, 13, 12, 63, 15, 90, 121, 2, 69, 53, 28, 72, 97, 3, 4, 94, 106, 61, 96, 18, 80, 74, 44, 84, 107, 98, 93, 103, 5, 91, 32, 76, 20, 68, 81, 95, 29, 27, 86, 104, 7, 64, 113, 78, 105, 58, 118, 117, 50, 70, 10, 101, 110, 19, 1, 115, 102, 71, 79, 57, 77, 122, 48, 114, 54, 37, 59, 49, 100, 11, 14, 92, 16})

        duration := time.Since(start)

        fmt.Printf("Ballclock with %v balls took %s;
", i, duration)
    }
}

This gave the output of:

Ballclock with 123 balls took 11.86748ms;

As you can see, the total time was 11.86 ms, all of which was spent in the CalculateBallCycle function. What would cause the benchmark to run in 456 ns/op while the running program runs in around 11867480 ms/op?

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

3条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
drfals1307 2017-07-18 09:23
关注
You wrote that CalcualteBallCycle() modifies the slice by design.

I can't speak to correctness of that approach, but it is why benchmark time of BenchmarkCalculateBallCycle123 is so different.

On first run it does the expected thing but on subsequent runs it does something completely different, because you're passing different data as input.

Benchmark this modified code:

func BenchmarkCalculateBallCycle123v2(b *testing.B) { m := []int{8, 62, 42, 87, 108, 35, 17, 6, 22, 75, 116, 112, 39, 119, 52, 60, 30, 88, 56, 36, 38, 26, 51, 31, 55, 120, 33, 99, 111, 24, 45, 21, 23, 34, 43, 41, 67, 65, 66, 85, 82, 89, 9, 25, 109, 47, 40, 0, 83, 46, 73, 13, 12, 63, 15, 90, 121, 2, 69, 53, 28, 72, 97, 3, 4, 94, 106, 61, 96, 18, 80, 74, 44, 84, 107, 98, 93, 103, 5, 91, 32, 76, 20, 68, 81, 95, 29, 27, 86, 104, 7, 64, 113, 78, 105, 58, 118, 117, 50, 70, 10, 101, 110, 19, 1, 115, 102, 71, 79, 57, 77, 122, 48, 114, 54, 37, 59, 49, 100, 11, 14, 92, 16} for n := 0; n < b.N; n++ { tmp := append([]int{}, m...) CalculateBallCycle(tmp) } }

This works-around this behavior by making a copy of m, so that CalculateBallCycle modifies a local copy.

The running time becomes more like the others:

BenchmarkCalculateBallCycle123-8 3000000 500 ns/op BenchmarkCalculateBallCycle123v2-8 100 10483347 ns/op
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(2条)

报告相同问题？

关注问题

函数调用和调用栈问题算法
2020-03-06 00:58

回答 1 已采纳只有1%的程序会把堆栈消耗光。对于这些程序，当然要采取措施防止堆栈溢出。99%的程序，不需要。
关于python 跨模块函数调用 python
2015-05-12 14:34

回答 2 已采纳 from import也会执行一次imptee模块中的show函数，所以最后会打印一次。
有一个js的函数调用问题
2015-04-23 14:57

回答 5 已采纳 f1(){ f2(){ }; }; 你只可以在f1中调用 f2 ;f1外部并不知道有f2
Xtensa处理器窗寄存器函数调用机制与应用
2021-01-19 17:40

这样频繁的堆栈存储器访问将明显降低应用程序的性能，为有效解决这一问题，Tensilica的Xtensa架构设计了一种Windows旋转方式的寄存器管理机制，将逻辑寄存器和物理寄存器分开，在函数调用的时候通过windows滑动切换...
简单的实例函数调用MFC mfc
2015-05-19 10:31

回答 4 已采纳自己修改删除看不懂的代码,现在解决问题.虽然解决了但代码还是不太完美.
自定义的函数调用了但没有执行 c语言有问必答
2021-05-27 08:24

回答 4 已采纳 #include<stdio.h> char *p,*a; void stract(char *x,char *y){ int i,j; i=j=0;
js函数调用问题新手求解 javascript
2017-08-25 03:18

回答 6 已采纳 ``` 这样等价于下面的，所以要加()，执行onblur的时候执行fn函数，不加()就不会执行fn了，也不报错 ip.onblur=function(){fn()} ip.on
5.5 汇编语言：函数调用约定
2023-08-22 16:14

微软技术分享的博客充分发挥了模块化设计思想的精髓，今天我将带大家一起来探索函数的实现机理，探索编译器到底是如何对函数这个关键字进行实现的，并使用汇编语言模拟实现函数编程中的参数传递调用规范等。
c语言fun函数的调用 c语言
2021-10-01 23:00

回答 1 已采纳 fun缺乏返回语句,试着在fun内定义char result[20];将所有的字符串赋值给result ,最后在fun内最后加入返回语句 return result;
python中一个类中函数怎么调用另一个类有参数self的函数 python
2018-08-18 10:26

回答 3 已采纳直接创建一个这个对象的类的实例 ``` # encoding: utf-8 class A: def foo(self): b = B() b.bar() c
内联函数中可以调用其他函数么 c++
2015-12-10 06:54

回答 4 已采纳至于整个FuncB也会被展开塞进代码里？这是函数FuncB 会不会内联的问题，和函数FuncA 没有任何关系可以内联的话 FuncB 被内联进 FuncA 编译器对内联成功的代码，自然不
C语言函数调用栈(一)
2018-07-19 22:16

Smlight的博客程序的执行过程可看作连续的函数调用。当一个函数执行完毕时，程序要回到调用指令的下一条指令(紧接call指令)处继续执行。函数调用过程通常使用堆栈实现，每个用户态进程对应一个调用栈结构(call stack)。编译器使用...
小程序，定义函数调用onShareAppMessage 小程序
2018-04-01 08:09

回答 8 已采纳不是这样定义的，onShareAppMessage(){} 这个是写在page 里面。页面用button type='share' 就可以了
Linux函数调用与栈
2017-06-05 17:31

ztenv的博客栈与函数调用惯例（又称调用约定）— 基础篇记得一年半前参加百度的校招面试时，被问到函数调用惯例的问题。当时只是懂个大概，比如常见函数调用约定类型及对应的参数入栈顺序等。最近看书过程中，重新回顾了这些...
嵌入式系统/ARM技术中的Xtensa处理器窗寄存器函数调用机制与应用
2020-10-21 01:43

这样频繁的堆栈存储器访问将明显降低应用程序的性能，为有效解决这一问题，Tensilica的Xtensa架构设计了一种Windows旋转方式的寄存器管理机制，将逻辑寄存器和物理寄存器分开，在函数调用的时候通过windows滑动切换...
没有解决我的问题, 去提问

悬赏问题

¥100 set_link_state
¥15 虚幻5 UE美术毛发渲染
¥15 CVRP 图论物流运输优化
¥15 Tableau online 嵌入ppt失败
¥100 支付宝网页转账系统不识别账号
¥15 基于单片机的靶位控制系统
¥15 真我手机蓝牙传输进度消息被关闭了，怎么打开？(关键词-消息通知)
¥15 装 pytorch 的时候出了好多问题，遇到这种情况怎么处理？
¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
¥15 手机接入宽带网线，如何释放宽带全部速度

函数调用会降低性能

3条回答 默认 最新

悬赏问题

3条回答默认最新