duanjia7912 2014-10-29 15:36
浏览 51
已采纳

去:多个len()调用与性能?

At the moment I am implementing some sorting algorithms. As it's in the nature of algorithms, there are a lot of calls on the length of some arrays/slices using the len() method.

Now, given the following code for a (part of) the Mergesort algorithm:

  for len(left) > 0 || len(right) > 0 {
        if len(left) > 0 && len(right) > 0 {
            if left[0] <= right[0] {
                result = append(result, left[0])
                left = left[1:len(left)]
            } else {
                result = append(result, right[0])
                right = right[1:len(right)]
            }
        } else if len(left) > 0 {
            result = append(result, left[0])
            left = left[1:len(left)]
        } else if len(right) > 0 {
            result = append(result, right[0])
            right = right[1:len(right)]
        }
    }

My question is: Do these multiple len() calls affect the performance of the algorithm negatively? Is it better to make a temporary variable for the length of the right and left slice? Or does the compiler does this itself?

  • 写回答

2条回答 默认 最新

  • dtj4307 2014-10-29 15:53
    关注

    There are two cases:

    • Local slice: length will be cached and there is no overhead
    • Global slice or passed (by reference): length cannot be cached and there is overhead

    No overhead for local slices

    For locally defined slices the length is cached, so there is no runtime overhead. You can see this in the assembly of the following program:

    func generateSlice(x int) []int {
        return make([]int, x)
    }
    
    func main() {
        x := generateSlice(10)
        println(len(x))
    }
    

    Compiled with go tool 6g -S test.go this yields, amongst other things, the following lines:

    MOVQ    "".x+40(SP),BX
    MOVQ    BX,(SP)
    // ...
    CALL    ,runtime.printint(SB)
    

    What happens here is that the first line retrieves the length of x by getting the value located 40 bytes from the beginning of x and most importantly caches this value in BX, which is then used for every occurrence of len(x). The reason for the offset is that an array has the following structure (source):

    typedef struct
    {               // must not move anything
        uchar   array[8];   // pointer to data
        uchar   nel[4];     // number of elements
        uchar   cap[4];     // allocated number of elements
    } Array;
    

    nel is what is accessed by len(). You can see this in the code generation as well.

    Global and referenced slices have overhead

    For shared values caching of the length is not possible since the compiler has to assume that the slice changes between calls. Therefore the compiler has to write code that accesses the length attribute directly every time. Example:

    func accessLocal() int {
        a := make([]int, 1000) // local
        count := 0
        for i := 0; i < len(a); i++ {
            count += len(a)
        }
        return count
    }
    
    var ag = make([]int, 1000) // pseudo-code
    
    func accessGlobal() int {
        count := 0
        for i := 0; i < len(ag); i++ {
            count += len(ag)
        }
        return count
    }
    

    Comparing the assembly of both functions yields the crucial difference that as soon as the variable is global the access to the nel attribute is not cached anymore and there will be a runtime overhead:

    // accessLocal
    MOVQ    "".a+8048(SP),SI // cache length in SI
    // ...
    CMPQ    SI,AX            // i < len(a)
    // ...
    MOVQ    SI,BX
    ADDQ    CX,BX
    MOVQ    BX,CX            // count += len(a)
    
    // accessGlobal
    MOVQ    "".ag+8(SB),BX
    CMPQ    BX,AX            // i < len(ag)
    // ...
    MOVQ    "".ag+8(SB),BX
    ADDQ    CX,BX
    MOVQ    BX,CX            // count += len(ag)
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 spss统计中二分类变量和有序变量的相关性分析可以用kendall相关分析吗?
  • ¥15 拟通过pc下指令到安卓系统,如果追求响应速度,尽可能无延迟,是不是用安卓模拟器会优于实体的安卓手机?如果是,可以快多少毫秒?
  • ¥20 神经网络Sequential name=sequential, built=False
  • ¥16 Qphython 用xlrd读取excel报错
  • ¥15 单片机学习顺序问题!!
  • ¥15 ikuai客户端多拨vpn,重启总是有个别重拨不上
  • ¥20 关于#anlogic#sdram#的问题,如何解决?(关键词-performance)
  • ¥15 相敏解调 matlab
  • ¥15 求lingo代码和思路
  • ¥15 公交车和无人机协同运输