dt4233 2018-05-10 22:36
浏览 70
已采纳

与类似的内存访问相比,C ++中的链表迭代比Go中的慢

In a variety of contexts I've observed that linked list iteration is consistently slower in C++ than in Go by 10-15%. My first attempt at resolving this mystery on Stack Overflow is here. The example I coded up was problematic because:

1) memory access was unpredictable because of heap allocations, and

2) because there was no actual work being done, some people's compilers were optimizing away the main loop.

To resolve these issues I have a new program with implementations in C++ and Go. The C++ version takes 1.75 secs compared to 1.48 secs for the Go version. This time, I do one large heap allocation before timing begins and use it to operate an object pool from which I release and acquire nodes for the linked list. This way the memory access should be completely analogous between the two implementations.

Hopefully this makes the mystery more reproducible!

C++:

#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
#include <vector>
#include <boost/timer.hpp>

using namespace std;

struct Node {
    Node *next; // 8 bytes
    int age;   // 4 bytes
};

// Object pool, where every free slot points to the previous free slot
template<typename T, int n>
struct ObjPool
{
    typedef T*       pointer;
    typedef pointer* metapointer;

    ObjPool() :
        _top(NULL),
        _size(0)
    {
        pointer chunks = new T[n];
        for (int i=0; i < n; i++) {
            release(&chunks[i]);
        }
    }

    // Giver an available pointer to the object pool
    void release(pointer ptr)
    {
        // Store the current pointer at the given address
        *(reinterpret_cast<metapointer>(ptr)) = _top;

        // Advance the pointer
        _top = ptr;

        // Increment the size
        ++_size;
    }

    // Pop an available pointer off the object pool for program use
    pointer acquire(void)
    {
        if(_size == 0){throw std::out_of_range("");}

        // Pop the top of the stack
        pointer retval = _top;

        // Step back to the previous address
        _top = *(reinterpret_cast<metapointer>(_top));

        // Decrement the size
        --_size;

        // Return the next free address
        return retval;
    }

    unsigned int size(void) const {return _size;}

protected:
    pointer _top;

    // Number of free slots available
    unsigned int _size;
};

Node *nodes = nullptr;
ObjPool<Node, 1000> p;

void processAge(int age) {
    // If the object pool is full, pop off the head of the linked list and release
    // it from the pool
    if (p.size() == 0) {
        Node *head = nodes;
        nodes = nodes->next;
        p.release(head);
    }

    // Insert the new Node with given age in global linked list. The linked list is sorted by age, so this requires iterating through the nodes.
    Node *node = nodes;
    Node *prev = nullptr;
    while (true) {
        if (node == nullptr || age < node->age) {
            Node *newNode = p.acquire();
            newNode->age = age;
            newNode->next = node;

            if (prev == nullptr) {
                nodes = newNode;
            } else {
                prev->next = newNode;
            }

            return;
        }

        prev = node;
        node = node->next;
    }
}

int main() {
    Node x = {};
    std::cout << "Size of struct: " << sizeof(x) << "
"; // 16 bytes

    boost::timer t;
    for (int i=0; i<1000000; i++) {
        processAge(i);
    }

    std::cout << t.elapsed() << "
";
}

Go:

package main

import (
    "time"
    "fmt"
    "unsafe"
)

type Node struct {
    next *Node // 8 bytes
    age int32 // 4 bytes
}

// Every free slot points to the previous free slot
type NodePool struct {
    top *Node
    size int
}

func NewPool(n int) NodePool {
    p := NodePool{nil, 0}
    slots := make([]Node, n, n)
    for i := 0; i < n; i++ {
        p.Release(&slots[i])
    }

    return p
}

func (p *NodePool) Release(l *Node) {
    // Store the current top at the given address
    *((**Node)(unsafe.Pointer(l))) = p.top
    p.top = l
    p.size++
}

func (p *NodePool) Acquire() *Node {
    if p.size == 0 {
        fmt.Printf("Attempting to pop from empty pool!
")
    }
    retval := p.top

    // Step back to the previous address in stack of addresses
    p.top = *((**Node)(unsafe.Pointer(p.top)))
    p.size--
    return retval
}

func processAge(age int32) {
    // If the object pool is full, pop off the head of the linked list and release
    // it from the pool
    if p.size == 0 {
        head := nodes
        nodes = nodes.next
        p.Release(head)
    }

    // Insert the new Node with given age in global linked list. The linked list is sorted by age, so this requires iterating through the nodes.
    node := nodes
    var prev *Node = nil
    for true {
        if node == nil || age < node.age {
            newNode := p.Acquire()
            newNode.age = age
            newNode.next = node

            if prev == nil {
                nodes = newNode
            } else {
                prev.next = newNode
            }
            return
        }

        prev = node
        node = node.next
    }
}

// Linked list of nodes, in ascending order by age
var nodes *Node = nil
var p NodePool = NewPool(1000)

func main() {
    x := Node{};
    fmt.Printf("Size of struct: %d
", unsafe.Sizeof(x)) // 16 bytes

    start := time.Now()
    for i := 0; i < 1000000; i++ {
        processAge(int32(i))
    }

    fmt.Printf("Time elapsed: %s
", time.Since(start))
}

Output:

clang++ -std=c++11 -stdlib=libc++ minimalPool.cpp -O3; ./a.out
Size of struct: 16
1.7548

go run minimalPool.go
Size of struct: 16
Time elapsed: 1.487930629s
  • 写回答

1条回答 默认 最新

  • dpswo40440 2018-05-10 23:49
    关注

    The big difference between your two programs is that your Go code ignores errors (and will panic or segfault, if you're lucky, if you empty the pool), while your C++ code propagates errors via exception. Compare:

    if p.size == 0 {
        fmt.Printf("Attempting to pop from empty pool!
    ")
    }
    

    vs.

    if(_size == 0){throw std::out_of_range("");}
    

    There are at least three ways1 to make the comparison fair:

    1. Can change the C++ code to ignore the error, as you do in Go,
    2. Change both versions to panic/abort on error.
    3. Change the Go version to handle errors idiomatically,2 as you do in C++.

    So, let's do all of them and compare the results3:

    • C++ ignoring error: 1.059329s wall, 1.050000s user + 0.000000s system = 1.050000s CPU (99.1%)
    • C++ abort on error: 1.081585s wall, 1.060000s user + 0.000000s system = 1.060000s CPU (98.0%)
    • Go panic on error: Time elapsed: 1.152942427s
    • Go ignoring error: Time elapsed: 1.196426068s
    • Go idiomatic error handling: Time elapsed: 1.322005119s
    • C++ exception: 1.373458s wall, 1.360000s user + 0.000000s system = 1.360000s CPU (99.0%)

    So:

    • Without error handling, C++ is faster than Go.
    • With panicking, Go gets faster,4 but still not as fast as C++.
    • With idiomatic error handling, C++ slows down a lot more than Go.

    Why? This exception never actually happens in your test run, so the actual error-handling code never runs in either language. But clang can't prove that it doesn't happen. And, since you never catch the exception anywhere, that means it has to emit exception handlers and stack unwinders for every non-elided frame all the way up the stack. So it's doing more work on each function call and return—not much more work, but then your function is doing so little real work that the unnecessary extra work adds up.


    1. You could also change the C++ version to do C-style error handling, or to use an Option type, and probably other possibilities.

    2. This, of course, requires a lot more changes: you need to import errors, change the return type of Acquire to (*Node, error), change the return type of processAge to error, change all your return statements, and add at least two if err != nil { … } checks. But that's supposed to be a good thing about Go, right?

    3. While I was at it, I replaced your legacy boost::timer with boost::auto_cpu_timer, so we're now seeing wall clock time (as with Go) as well as CPU time.

    4. I won't attempt to explain why, because I don't understand it. From a quick glance at the assembly, it's clearly optimized out some checks, but I can't see why it couldn't optimize out those same checks without the panic.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 使用C#,asp.net读取Excel文件并保存到Oracle数据库
  • ¥15 C# datagridview 单元格显示进度及值
  • ¥15 thinkphp6配合social login单点登录问题
  • ¥15 HFSS 中的 H 场图与 MATLAB 中绘制的 B1 场 部分对应不上
  • ¥15 如何在scanpy上做差异基因和通路富集?
  • ¥20 关于#硬件工程#的问题,请各位专家解答!
  • ¥15 关于#matlab#的问题:期望的系统闭环传递函数为G(s)=wn^2/s^2+2¢wn+wn^2阻尼系数¢=0.707,使系统具有较小的超调量
  • ¥15 FLUENT如何实现在堆积颗粒的上表面加载高斯热源
  • ¥30 虚心请教几个问题,小生先有礼了
  • ¥30 截图中的mathematics程序转换成matlab