doujiufutaog59220 2014-10-27 10:47
浏览 32

使用不同输入数据的Goroutine执行时间

I am experimenting with goroutine for parallelizing some computation. However, the execution time of goroutine confuse me. My experiment setup is simple.

runtime.GOMAXPROCS(3)

datalen := 1000000000
data21 := make([]float64, datalen)
data22 := make([]float64, datalen)
data23 := make([]float64, datalen)

t := time.Now()
res := make(chan interface{}, dlen)

go func() {
    for i := 0; i < datalen; i++ {
        data22[i] = math.Sqrt(13)
    }
    res <- true
}()

go func() {
    for i := 0; i < datalen; i++ {
        data22[i] = math.Sqrt(13)
    }
    res <- true
}()

go func() {
    for i := 0; i < datalen; i++ {
        data22[i] = math.Sqrt(13)
    }
    res <- true
}()

for i:=0; i<3; i++ {
    <-res
}
fmt.Printf("The parallel for loop took %v to run.
", time.Since(t))

Notice that I loaded the same data in 3 goroutines, the execution time for this program is

The parallel for loop took 7.436060182s to run.

However, if I let each goroutine handle different data as follows:

runtime.GOMAXPROCS(3)

datalen := 1000000000
data21 := make([]float64, datalen)
data22 := make([]float64, datalen)
data23 := make([]float64, datalen)

t := time.Now()
res := make(chan interface{}, dlen)

go func() {
    for i := 0; i < datalen; i++ {
        data21[i] = math.Sqrt(13)
    }
    res <- true
}()

go func() {
    for i := 0; i < datalen; i++ {
        data22[i] = math.Sqrt(13)
    }
    res <- true
}()

go func() {
    for i := 0; i < datalen; i++ {
        data23[i] = math.Sqrt(13)
    }
    res <- true
}()

for i:=0; i<3; i++ {
    <-res
}
fmt.Printf("The parallel for loop took %v to run.
", time.Since(t))

The execution time for this is almost 3 times more than previous and is almost equal/worse then sequential execution without goroutine

The parallel for loop took 20.744438468s to run.

I guess maybe I use the goroutine in a wrong way. So what should be the correct way to use multiple goroutines to handle different pieces of data;

  • 写回答

2条回答 默认 最新

  • du_1993 2014-10-27 11:29
    关注

    Since your example program is not performing any substantial calculation, the bottleneck is going to be the speed at which data can be written to memory. With the settings in the example, we're talking about 22 GB of writes which is not insignificant.

    Given the time difference in the run time of the two examples, one likely possibility is that it isn't actually writing as much to the RAM. Given that memory writes are cached by the CPU, the execution probably looks something like this:

    1. the first goroutine writes out data to a cache line representing the start of the data22 array.
    2. the second goroutine writes out data to a cache line representing the same location. The CPU running the first goroutine notices that the write invalidates its own cached write, so throws away its changes.
    3. the third goroutine writes out data to a cache line representing the same location. The CPU running the second goroutine notices that the write invalidates its own cached write, so throws away its changes.
    4. the cache line in the third CPU is evicted and the changes are written out to RAM.

    This process continues as the goroutines progress through the data22 array. Since RAM is the bottleneck and we end up writing one third as much data in this scenario, it isn't that surprising that it runs approximately 3 times as fast as the second case.

    评论

报告相同问题?

悬赏问题

  • ¥15 arduino控制ps2手柄一直报错
  • ¥15 使用rabbitMQ 消息队列作为url源进行多线程爬取时,总有几个url没有处理的问题。
  • ¥15 求chat4.0解答一道线性规划题,用lingo编程运行,第一问要求写出数学模型和lingo语言编程模型,第二问第三问解答就行,我的ddl要到了谁来求了
  • ¥15 Ubuntu在安装序列比对软件STAR时出现报错如何解决
  • ¥50 树莓派安卓APK系统签名
  • ¥65 汇编语言除法溢出问题
  • ¥15 Visual Studio问题
  • ¥20 求一个html代码,有偿
  • ¥100 关于使用MATLAB中copularnd函数的问题
  • ¥20 在虚拟机的pycharm上