duanqiongdu9916 2017-01-13 10:22
浏览 43
已采纳

在Go中同时生成随机数

I'm new to Go and to concurrent/parallel programming in general. In order to try out (and hopefully see the performance benefits of) goroutines, I've put together a small test program that simply generates 100 million random ints - first in a single goroutine, and then in as many goroutines as reported by runtime.NumCPU().

However, I consistently get worse performance using more goroutines than using a single one. I assume I'm missing something vital in either my programs design or the way in which I use goroutines/channels/other Go features. Any feedback is much appreciated.

I attach the code below.

package main

import "fmt"
import "time"
import "math/rand"
import "runtime"

func main() {
  // Figure out how many CPUs are available and tell Go to use all of them
  numThreads := runtime.NumCPU()
  runtime.GOMAXPROCS(numThreads)

  // Number of random ints to generate
  var numIntsToGenerate = 100000000
  // Number of ints to be generated by each spawned goroutine thread
  var numIntsPerThread = numIntsToGenerate / numThreads
  // Channel for communicating from goroutines back to main function
  ch := make(chan int, numIntsToGenerate)

  // Slices to keep resulting ints
  singleThreadIntSlice := make([]int, numIntsToGenerate, numIntsToGenerate)
  multiThreadIntSlice := make([]int, numIntsToGenerate, numIntsToGenerate)

  fmt.Printf("Initiating single-threaded random number generation.
")
  startSingleRun := time.Now()
  // Generate all of the ints from a single goroutine, retrieve the expected
  // number of ints from the channel and put in target slice
  go makeRandomNumbers(numIntsToGenerate, ch)
  for i := 0; i < numIntsToGenerate; i++ {
    singleThreadIntSlice = append(singleThreadIntSlice,(<-ch))
  }
  elapsedSingleRun := time.Since(startSingleRun)
  fmt.Printf("Single-threaded run took %s
", elapsedSingleRun)


  fmt.Printf("Initiating multi-threaded random number generation.
")
  startMultiRun := time.Now()
  // Run the designated number of goroutines, each of which generates its
  // expected share of the total random ints, retrieve the expected number
  // of ints from the channel and put in target slice
  for i := 0; i < numThreads; i++ {
    go makeRandomNumbers(numIntsPerThread, ch)
  }
  for i := 0; i < numIntsToGenerate; i++ {
    multiThreadIntSlice = append(multiThreadIntSlice,(<-ch))
  }
  elapsedMultiRun := time.Since(startMultiRun)
  fmt.Printf("Multi-threaded run took %s
", elapsedMultiRun)
}


func makeRandomNumbers(numInts int, ch chan int) {
  source := rand.NewSource(time.Now().UnixNano())
  generator := rand.New(source)
  for i := 0; i < numInts; i++ {
        ch <- generator.Intn(numInts*100)
    }
}
  • 写回答

2条回答 默认 最新

  • dpa31905 2017-01-13 10:51
    关注

    First let's correct and optimize some things in your code:

    Since Go 1.5, GOMAXPROCS defaults to the number of CPU cores available, so no need to set that (although it does no harm).

    Numbers to generate:

    var numIntsToGenerate = 100000000
    var numIntsPerThread = numIntsToGenerate / numThreads
    

    If numThreads is like 3, in case of multi goroutines, you'll have less numbers generated (due to integer division), so let's correct it:

    numIntsToGenerate = numIntsPerThread * numThreads
    

    No need a buffer for 100 million values, reduce that to a sensible value (e.g. 1000):

    ch := make(chan int, 1000)
    

    If you want to use append(), the slices you create should have 0 length (and proper capacity):

    singleThreadIntSlice := make([]int, 0, numIntsToGenerate)
    multiThreadIntSlice := make([]int, 0, numIntsToGenerate)
    

    But in your case that's unnecessary, as only 1 goroutine is collecting the results, you can simply use indexing, and create slices like this:

    singleThreadIntSlice := make([]int, numIntsToGenerate)
    multiThreadIntSlice := make([]int, numIntsToGenerate)
    

    And when collecting results:

    for i := 0; i < numIntsToGenerate; i++ {
        singleThreadIntSlice[i] = <-ch
    }
    
    // ...
    
    for i := 0; i < numIntsToGenerate; i++ {
        multiThreadIntSlice[i] = <-ch
    }
    

    Ok. Code is now better. Attempting to run it, you will still experience that the multi-goroutine version runs slower. Why is that?

    It's because controlling, synchronizing and collecting results from multiple goroutines does have overhead. If the task they perform is little, the communication overhead will be greater and overall you lose performance.

    Your case is such a case. Generating a single random number once you set up your rand.Rand() is pretty fast.

    Let's modify your "task" to be big enough so that we can see the benefit of multiple goroutines:

    // 1 million is enough now:
    var numIntsToGenerate = 1000 * 1000
    
    
    func makeRandomNumbers(numInts int, ch chan int) {
        source := rand.NewSource(time.Now().UnixNano())
        generator := rand.New(source)
        for i := 0; i < numInts; i++ {
            // Kill time, do some processing:
            for j := 0; j < 1000; j++ {
                generator.Intn(numInts * 100)
            }
            // and now return a single random number
            ch <- generator.Intn(numInts * 100)
        }
    }
    

    In this case to get a random number, we generate 1000 random numbers and just throw them away (to make some calculation / kill time) before we generate the one we return. We do this so that the calculation time of the worker goroutines outweights the communication overhead of multiple goroutines.

    Running the app now, my results on a 4-core machine:

    Initiating single-threaded random number generation.
    Single-threaded run took 2.440604504s
    Initiating multi-threaded random number generation.
    Multi-threaded run took 987.946758ms
    

    The multi-goroutine version runs 2.5 times faster. This means if your goroutines would deliver random numbers in 1000-blocks, you would see 2.5 times faster execution (compared to the single goroutine generation).

    One last note:

    Your single-goroutine version also uses multiple goroutines: 1 to generate numbers and 1 to collect the results. Most likely the collector does not fully utilize a CPU core and mostly just waits for the results, but still: 2 CPU cores are used. Let's estimate that "1.5" CPU cores are utilized. While the multi-goroutine version utilizes 4 CPU cores. Just as a rough estimation: 4 / 1.5 = 2.66, very close to our performance gain.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 多电路系统共用电源的串扰问题
  • ¥15 slam rangenet++配置
  • ¥15 有没有研究水声通信方面的帮我改俩matlab代码
  • ¥15 对于相关问题的求解与代码
  • ¥15 ubuntu子系统密码忘记
  • ¥15 信号傅里叶变换在matlab上遇到的小问题请求帮助
  • ¥15 保护模式-系统加载-段寄存器
  • ¥15 电脑桌面设定一个区域禁止鼠标操作
  • ¥15 求NPF226060磁芯的详细资料
  • ¥15 使用R语言marginaleffects包进行边际效应图绘制