为什么goroutine分配在多个内核上变慢？

I was doing some experiments in Go and I found something really odd. When I run the following code on my computer it executes in ~0.5 seconds.

package main

import (
  "fmt"
  "runtime"
  "time"
)
func waitAround(die chan bool) {
  <- die
}
func main() {
  var startMemory runtime.MemStats
  runtime.ReadMemStats(&startMemory)

  start := time.Now()
  cpus := runtime.NumCPU()
  runtime.GOMAXPROCS(cpus)
  die := make(chan bool)
  count := 100000
  for i := 0; i < count; i++ {
    go waitAround(die)
  }
  elapsed := time.Since(start)

  var endMemory runtime.MemStats
  runtime.ReadMemStats(&endMemory)

  fmt.Printf("Started %d goroutines
%d CPUs
%f seconds
",
    count, cpus, elapsed.Seconds())
  fmt.Printf("Memory before %d
memory after %d
", startMemory.Alloc,
    endMemory.Alloc)
  fmt.Printf("%d goroutines running
", runtime.NumGoroutine())
  fmt.Printf("%d bytes per goroutine
", (endMemory.Alloc - startMemory.Alloc)/uint64(runtime.NumGoroutine()))

  close(die)
}

However, when I execute it using runtime.GOMAXPROCS(1) it executes much faster (~0.15 seconds). Can anybody explain to me why running many goroutines would be slower using more cores? Is there any significant overhead to multiplexing the goroutines onto multiple cores? I realize the goroutines aren't doing anything and it would probably be a different story if I had to wait for the routines to actually do something.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dsavz66262 2013-04-15 17:34
关注
When running on a single core, goroutine allocation and switching is just a matter of internal accounting. Goroutines are never preempted, so the switching logic is extremely simple and very fast. And more importantly in this case, your main routine does not yield at all, so the goroutines never even begin execution before they're terminated. You allocate the structure and then delete it, and that's that. (edit This may not be true with newer versions of go, but it is certainly more orderly with only 1 process)

But when you map routines over multiple threads, then you suddenly get os-level context switching involved, which is orders of magnitude slower and more complex. And even if you're on multiple cores, there's a lot more work that has to be done. Plus now your gouroutines may actually be running before the program gets terminated.

Try straceing the program under both conditions and see how its behavior differs.

本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

报告相同问题？

关注问题

为什么goroutine分配在多个内核上变慢？
2013-04-15 17:26

回答 2 已采纳 When running on a single core, goroutine allocation and switching is just a matter of internal acc
如何检测到什么阻止了在golang中使用多个内核？
2017-02-21 20:30

回答 1 已采纳 Sorry, but in the end I got the measurement wrong. @JimB was right, and I had a minor leak, but no
Linux v内核是什么意思？？ linux
2023-04-20 22:13

回答 2 已采纳不知道你这个问题是否已经解决, 如果还没有解决的话: 这个问题的回答你可以参考下: https://ask.csdn.net/questions/7630370你也可以参考下这篇文章：你最常用的一个
Goroutine 并发调度模型深度解析之手撸一个高性能 goroutine 池
2021-02-23 17:07

Geffin的博客 Scheduler2.1 线程那些事儿2.1.1 用户级线程模型2.1.2 内核级线程模型2.1.3 两级线程模型2.2 G-P-M 模型概述2.3 G-P-M 模型调度2.3.1 用户态阻塞/唤醒2.3.2 系统调用阻塞3 大规模 Goroutine 的瓶颈3.1 一个 http ...
OpenArk为什么我关闭了杀毒软件还是进入不了内核模式？ c++ c语言 html
2023-01-19 12:06

回答 1 已采纳把 windows defender 关了
明明只有一个内核但是为什么在jupyterlab中却显示了好几个内核 jupyter python
2023-02-03 18:36

回答 2 已采纳该回答引用ChatGPT，如果有帮助到您请点个采纳JupyterLab 显示多个内核的情况通常是因为它检测到了多个安装的 Jupyter 内核。有可能是因为你在不同的环境中安装了 Jupyter，或者
为什么linux更新内核安装deb包时会这样？ linux ubuntu
2021-10-20 17:06

回答 1 已采纳你安装的这个deb包是不是需要安装其他依赖包？
Linux上TCP的几个内核参数调优及Linux多线程应用性能分析
2022-05-25 13:48

90后小伙追梦之路的博客 Linux作为一个强大的操作系统，提供了一系列内核参数供我们进行调优。...同时，笔者还会在余下的博客里面详细解释了为什么要进行这些调优！ tcp_max_syn_backlog,somaxconn,tcp_abort_on_overflow tcp_max_
petalinux 的内核路径在哪？ linux
2017-07-31 02:57

回答 2 已采纳 https://blog.csdn.net/zhouxiangjun11211/article/details/53998881
内核引导为什么分为两个阶段？一个阶段不行吗？ linux
2015-08-17 01:30

回答 2 已采纳第一段时硬件的初始化，第二段是操作系统的初始化。如果只有第二段引导，那么操作系统不仅需要能识别/初始化市面上所有的硬件，还要随着新硬件的上市同步发布新版本。——谁受得了！
多个linux内核模块之间共用的C文件如何防止其多次编译？ c语言 linux 其他
2021-05-11 16:53

回答 1 已采纳把common.c做成动态链接库最方便
Linux 上下文切换寄存器内核线程用户线程
2019-07-21 17:12

Tw!light的博客最近在看Go语言的goroutine调度，看到一篇理论文章，对一些关于Linux多线程的知识进行进一步的了解并且记录。目录什么是：CPU寄存器 CPU上下文切换？进程上下文切换线程上下文切换中断上下文切换 Go程序...
Goroutine不使用最大CPU和内核
2016-10-28 16:04

回答 2 已采纳 Goroutines and threads are not the same. Ergo you should not expect any CPU affinity. See more for
几个内核参数引起的 K8s 集群 Java 血案
2022-10-30 11:44

米开朗基杨的博客 2、最初怀疑是 docker 服务有问题，切换至节点上查看 docker & kubelet 日志，如下： ❝ kubelet 无法初始化线程，需要增加所处运行用户的进程限制，大致意思就是需要调整 ulimit -u（具体分析如下先描述问题） $ ...
Go 协程（goroutine）调度原理
2022-12-12 23:20

试剑江湖。的博客 Goroutine调度是一个很复杂的机制，尽管Go源码中提供了大量的注释，但对其原理没有一个好的理解的情况下去读源码收获不会...下面尝试用简单的语言描述一下Goroutine调度机制，在此基础上再去研读源码效果可能更好一些。
没有解决我的问题, 去提问

悬赏问题

¥15 求lingo代码和思路
¥15 公交车和无人机协同运输
¥15 stm32代码移植没反应
¥15 matlab基于pde算法图像修复，为什么只能对示例图像有效
¥100 连续两帧图像高速减法
¥15 如何绘制动力学系统的相图
¥15 对接wps接口实现获取元数据
¥20 给自己本科IT专业毕业的妹m找个实习工作
¥15 用友U8：向一个无法连接的网络尝试了一个套接字操作，如何解决？
¥30 我的代码按理说完成了模型的搭建、训练、验证测试等工作(标签-网络|关键词-变化检测)

为什么goroutine分配在多个内核上变慢？

2条回答 默认 最新

悬赏问题

2条回答默认最新