与类似的内存访问相比，C ++中的链表迭代比Go中的慢

In a variety of contexts I've observed that linked list iteration is consistently slower in C++ than in Go by 10-15%. My first attempt at resolving this mystery on Stack Overflow is here. The example I coded up was problematic because:

1) memory access was unpredictable because of heap allocations, and

2) because there was no actual work being done, some people's compilers were optimizing away the main loop.

To resolve these issues I have a new program with implementations in C++ and Go. The C++ version takes 1.75 secs compared to 1.48 secs for the Go version. This time, I do one large heap allocation before timing begins and use it to operate an object pool from which I release and acquire nodes for the linked list. This way the memory access should be completely analogous between the two implementations.

Hopefully this makes the mystery more reproducible!

C++:

#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
#include <vector>
#include <boost/timer.hpp>

using namespace std;

struct Node {
    Node *next; // 8 bytes
    int age;   // 4 bytes
};

// Object pool, where every free slot points to the previous free slot
template<typename T, int n>
struct ObjPool
{
    typedef T*       pointer;
    typedef pointer* metapointer;

    ObjPool() :
        _top(NULL),
        _size(0)
    {
        pointer chunks = new T[n];
        for (int i=0; i < n; i++) {
            release(&chunks[i]);
        }
    }

    // Giver an available pointer to the object pool
    void release(pointer ptr)
    {
        // Store the current pointer at the given address
        *(reinterpret_cast<metapointer>(ptr)) = _top;

        // Advance the pointer
        _top = ptr;

        // Increment the size
        ++_size;
    }

    // Pop an available pointer off the object pool for program use
    pointer acquire(void)
    {
        if(_size == 0){throw std::out_of_range("");}

        // Pop the top of the stack
        pointer retval = _top;

        // Step back to the previous address
        _top = *(reinterpret_cast<metapointer>(_top));

        // Decrement the size
        --_size;

        // Return the next free address
        return retval;
    }

    unsigned int size(void) const {return _size;}

protected:
    pointer _top;

    // Number of free slots available
    unsigned int _size;
};

Node *nodes = nullptr;
ObjPool<Node, 1000> p;

void processAge(int age) {
    // If the object pool is full, pop off the head of the linked list and release
    // it from the pool
    if (p.size() == 0) {
        Node *head = nodes;
        nodes = nodes->next;
        p.release(head);
    }

    // Insert the new Node with given age in global linked list. The linked list is sorted by age, so this requires iterating through the nodes.
    Node *node = nodes;
    Node *prev = nullptr;
    while (true) {
        if (node == nullptr || age < node->age) {
            Node *newNode = p.acquire();
            newNode->age = age;
            newNode->next = node;

            if (prev == nullptr) {
                nodes = newNode;
            } else {
                prev->next = newNode;
            }

            return;
        }

        prev = node;
        node = node->next;
    }
}

int main() {
    Node x = {};
    std::cout << "Size of struct: " << sizeof(x) << "
"; // 16 bytes

    boost::timer t;
    for (int i=0; i<1000000; i++) {
        processAge(i);
    }

    std::cout << t.elapsed() << "
";
}

Go:

package main

import (
    "time"
    "fmt"
    "unsafe"
)

type Node struct {
    next *Node // 8 bytes
    age int32 // 4 bytes
}

// Every free slot points to the previous free slot
type NodePool struct {
    top *Node
    size int
}

func NewPool(n int) NodePool {
    p := NodePool{nil, 0}
    slots := make([]Node, n, n)
    for i := 0; i < n; i++ {
        p.Release(&slots[i])
    }

    return p
}

func (p *NodePool) Release(l *Node) {
    // Store the current top at the given address
    *((**Node)(unsafe.Pointer(l))) = p.top
    p.top = l
    p.size++
}

func (p *NodePool) Acquire() *Node {
    if p.size == 0 {
        fmt.Printf("Attempting to pop from empty pool!
")
    }
    retval := p.top

    // Step back to the previous address in stack of addresses
    p.top = *((**Node)(unsafe.Pointer(p.top)))
    p.size--
    return retval
}

func processAge(age int32) {
    // If the object pool is full, pop off the head of the linked list and release
    // it from the pool
    if p.size == 0 {
        head := nodes
        nodes = nodes.next
        p.Release(head)
    }

    // Insert the new Node with given age in global linked list. The linked list is sorted by age, so this requires iterating through the nodes.
    node := nodes
    var prev *Node = nil
    for true {
        if node == nil || age < node.age {
            newNode := p.Acquire()
            newNode.age = age
            newNode.next = node

            if prev == nil {
                nodes = newNode
            } else {
                prev.next = newNode
            }
            return
        }

        prev = node
        node = node.next
    }
}

// Linked list of nodes, in ascending order by age
var nodes *Node = nil
var p NodePool = NewPool(1000)

func main() {
    x := Node{};
    fmt.Printf("Size of struct: %d
", unsafe.Sizeof(x)) // 16 bytes

    start := time.Now()
    for i := 0; i < 1000000; i++ {
        processAge(int32(i))
    }

    fmt.Printf("Time elapsed: %s
", time.Since(start))
}

Output:

clang++ -std=c++11 -stdlib=libc++ minimalPool.cpp -O3; ./a.out
Size of struct: 16
1.7548

go run minimalPool.go
Size of struct: 16
Time elapsed: 1.487930629s

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dpswo40440 2018-05-10 23:49
关注
The big difference between your two programs is that your Go code ignores errors (and will panic or segfault, if you're lucky, if you empty the pool), while your C++ code propagates errors via exception. Compare:

if p.size == 0 { fmt.Printf("Attempting to pop from empty pool! ") }

vs.

if(_size == 0){throw std::out_of_range("");}

There are at least three ways¹ to make the comparison fair:

Can change the C++ code to ignore the error, as you do in Go,

Change both versions to panic/abort on error.

Change the Go version to handle errors idiomatically,² as you do in C++.

So, let's do all of them and compare the results³:

C++ ignoring error: 1.059329s wall, 1.050000s user + 0.000000s system = 1.050000s CPU (99.1%)

C++ abort on error: 1.081585s wall, 1.060000s user + 0.000000s system = 1.060000s CPU (98.0%)

Go panic on error: Time elapsed: 1.152942427s

Go ignoring error: Time elapsed: 1.196426068s

Go idiomatic error handling: Time elapsed: 1.322005119s

C++ exception: 1.373458s wall, 1.360000s user + 0.000000s system = 1.360000s CPU (99.0%)

So:

Without error handling, C++ is faster than Go.

With panicking, Go gets faster,⁴ but still not as fast as C++.

With idiomatic error handling, C++ slows down a lot more than Go.

Why? This exception never actually happens in your test run, so the actual error-handling code never runs in either language. But clang can't prove that it doesn't happen. And, since you never catch the exception anywhere, that means it has to emit exception handlers and stack unwinders for every non-elided frame all the way up the stack. So it's doing more work on each function call and return—not much more work, but then your function is doing so little real work that the unnecessary extra work adds up.

_{1. You could also change the C++ version to do C-style error handling, or to use an Option type, and probably other possibilities.}

_{2. This, of course, requires a lot more changes: you need to import errors, change the return type of Acquire to (*Node, error), change the return type of processAge to error, change all your return statements, and add at least two if err != nil { … } checks. But that's supposed to be a good thing about Go, right?}

_{3. While I was at it, I replaced your legacy boost::timer with boost::auto_cpu_timer, so we're now seeing wall clock time (as with Go) as well as CPU time.}

_{4. I won't attempt to explain why, because I don't understand it. From a quick glance at the assembly, it's clearly optimized out some checks, but I can't see why it couldn't optimize out those same checks without the panic.}
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

与类似的内存访问相比，C ++中的链表迭代比Go中的慢 c++
2018-05-10 22:36

回答 1 已采纳 The big difference between your two programs is that your Go code ignores errors (and will panic o
C语言程序设计中链表问题 c语言链表
2023-03-23 14:52

回答 4 已采纳比起修改，我甚至更愿意重写一个。下面是我重写的代码，那个统计函数还没有写，等明天吧。 #include <stdio.h> #include <stdlib.h> #inclu
C语言链表创建中的malloc问题 c语言数据结构链表
2022-07-17 22:26

回答 3 已采纳应该是可以用的，你是不是把结构体的定义没有包含进来
会C语言学go难吗,为什么Go语言不是想象中的那么好
2021-05-21 17:08

布洛分制造局的博客一门编程语言，也许会用上一辈子，所以选择的时候要注意。本文专注于 Go 的各种吐槽。老生常谈的有之，鲜为人知的也有。我用 Rust 和Haskell 作为参照 (至少，我以为，这俩都很不错)。本文列出的所有问题，都...
C语言数据结构中链表的创建 c语言链表
2022-06-25 13:59

回答 1 已采纳这个不算是已经创建了链表，只是定义了链表结点的结构体类型，定义了一个链表结点的指针 pHead ，需要写一个 createList()函数来创建链表。
删除链表中的重复元素（链表为随机排列，请使用C语言） c语言数据结构链表
2021-10-27 10:51

回答 1 已采纳修改如下，供参考： #include <stdio.h> #include <stdlib.h> typedef struct List { int value;
C语言将txt文件写入到链表中遇到的问题fscanf函数 c++ c语言开发语言
2022-09-09 11:23

回答 2 已采纳 pre->age改为&pre->age整型输入需要取地址才行
Go语言超全详解（入门级）
2022-07-14 13:26

大家好，我是好同学的博客 Go语言是谷歌2009年发布的第二款开源编程语言（系统开发语言)，它是基于编译... Go语言专门针对多处理器系统应用程序的编程进行了优化，使用Go编译的程序可以媲美 C / C++代码的速度，而且更加安全、支持并行进程......
python 链表中验证内存内容的代码链表
2022-10-04 09:30

回答 3 已采纳简单示例： class Node(object): def __init__(self, value__, next__ = None): self.value__ = val
C语言数据结构中的链表问题 c语言
2022-08-23 12:01

回答 2 已采纳你是不想销毁链表头节点吧？如果链表头结点也要销毁，你这个函数是没有实现的
c语言链表读取访问权限出错 c语言有问必答链表
2021-12-14 10:12

回答 2 已采纳 create函数有问题，最后一个节点的next指针没有指向NULLn = n + 1;这里使用n的时候，n没有初始化create函数修改如下： #include <stdio.h> str
「算法与数据结构」JavaScript中的链表
2021-01-28 07:30

isboyjc的博客写在前面此文会先探讨下什么是链表以及在 JavaScript 中的链表，接着我们会使用 JavaScript 这门语言动手实现下各类链表的设计，最后我们会抛出一些常规疑问，并从各个方面一...
c语言链表，要不要释放内存（free) c语言链表
2022-03-06 22:50

回答 3 已采纳链表用的就是这个指针，你不应该在插入的时候free的才是对的，但是你应该free是在链表删除一个节点或者删除的时候依次free掉这里的内存。你得代码能过？第一次用while的时候你的n没有初始
GO语言并发编程入门：Goroutine、Channel、Context、并发安全、GMP调度模型
2023-05-25 15:24

Pistachiout的博客并发：多线程程序在一个核的cpu上运行。并行：多线程程序在多个核的cpu上运行。由上可知并发不是并行，并行是直接利用多核实现多线程的运行，并发则主要由切换...Goroutine协程：Goroutine是Go语言中的并发执行单位。
Go 内存管理与垃圾回收
2020-12-22 00:51

SilvermingX的博客 Go 语言抛弃了 C/C++ 中的开发者管理内存的方式：主动申请与主动释放，增加了逃逸分析和 GC，这样开发者就能从内存管理中释放出来，有更多的精力去关注软件设计，而不是底层的内存问题。这是 Go 语言成为高生产力...
没有解决我的问题, 去提问

悬赏问题

¥20 机器学习能否像多层线性模型一样处理嵌套数据
¥20 西门子S7-Graph,S7-300，梯形图
¥50 用易语言http 访问不了网页
¥50 safari浏览器fetch提交数据后数据丢失问题
¥15 matlab不知道怎么改，求解答！！
¥15 永磁直线电机的电流环pi调不出来
¥15 用stata实现聚类的代码
¥15 请问paddlehub能支持移动端开发吗？在Android studio上该如何部署？
¥20 docker里部署springboot项目，访问不到扬声器
¥15 netty整合springboot之后自动重连失效

与类似的内存访问相比，C ++中的链表迭代比Go中的慢

1条回答 默认 最新

悬赏问题

1条回答默认最新