douba9654 2013-04-15 17:26
浏览 73
已采纳

为什么goroutine分配在多个内核上变慢?

I was doing some experiments in Go and I found something really odd. When I run the following code on my computer it executes in ~0.5 seconds.

package main

import (
  "fmt"
  "runtime"
  "time"
)
func waitAround(die chan bool) {
  <- die
}
func main() {
  var startMemory runtime.MemStats
  runtime.ReadMemStats(&startMemory)

  start := time.Now()
  cpus := runtime.NumCPU()
  runtime.GOMAXPROCS(cpus)
  die := make(chan bool)
  count := 100000
  for i := 0; i < count; i++ {
    go waitAround(die)
  }
  elapsed := time.Since(start)

  var endMemory runtime.MemStats
  runtime.ReadMemStats(&endMemory)

  fmt.Printf("Started %d goroutines
%d CPUs
%f seconds
",
    count, cpus, elapsed.Seconds())
  fmt.Printf("Memory before %d
memory after %d
", startMemory.Alloc,
    endMemory.Alloc)
  fmt.Printf("%d goroutines running
", runtime.NumGoroutine())
  fmt.Printf("%d bytes per goroutine
", (endMemory.Alloc - startMemory.Alloc)/uint64(runtime.NumGoroutine()))

  close(die)
}

However, when I execute it using runtime.GOMAXPROCS(1) it executes much faster (~0.15 seconds). Can anybody explain to me why running many goroutines would be slower using more cores? Is there any significant overhead to multiplexing the goroutines onto multiple cores? I realize the goroutines aren't doing anything and it would probably be a different story if I had to wait for the routines to actually do something.

  • 写回答

2条回答 默认 最新

  • dsavz66262 2013-04-15 17:34
    关注

    When running on a single core, goroutine allocation and switching is just a matter of internal accounting. Goroutines are never preempted, so the switching logic is extremely simple and very fast. And more importantly in this case, your main routine does not yield at all, so the goroutines never even begin execution before they're terminated. You allocate the structure and then delete it, and that's that. (edit This may not be true with newer versions of go, but it is certainly more orderly with only 1 process)

    But when you map routines over multiple threads, then you suddenly get os-level context switching involved, which is orders of magnitude slower and more complex. And even if you're on multiple cores, there's a lot more work that has to be done. Plus now your gouroutines may actually be running before the program gets terminated.

    Try straceing the program under both conditions and see how its behavior differs.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 Oracle中如何从clob类型截取特定字符串后面的字符
  • ¥15 想通过pywinauto自动电机应用程序按钮,但是找不到应用程序按钮信息
  • ¥15 MATLAB中streamslice问题
  • ¥15 如何在炒股软件中,爬到我想看的日k线
  • ¥15 51单片机中C语言怎么做到下面类似的功能的函数(相关搜索:c语言)
  • ¥15 seatunnel 怎么配置Elasticsearch
  • ¥15 PSCAD安装问题 ERROR: Visual Studio 2013, 2015, 2017 or 2019 is not found in the system.
  • ¥15 (标签-MATLAB|关键词-多址)
  • ¥15 关于#MATLAB#的问题,如何解决?(相关搜索:信噪比,系统容量)
  • ¥500 52810做蓝牙接受端