为什么Locking Go比Java慢得多？ Mutex.Lock（）Mutex.Unlock（）花费了大量时间

I've written a small Go library (go-patan) that collects a running min/max/avg/stddev of certain variables. I compared it to an equivalent Java implementation (patan), and to my surprise the Java implementation is much faster. I would like to understand why.

The library basically consists of a simple data store with a lock that serializes reads and writes. This is a snippet of the code:

type Store struct {
   durations map[string]*Distribution
   counters  map[string]int64
   samples   map[string]*Distribution

   lock *sync.Mutex
}

func (store *Store) addSample(key string, value int64) {
  store.addToStore(store.samples, key, value)
}

func (store *Store) addDuration(key string, value int64) {
  store.addToStore(store.durations, key, value)
}

func (store *Store) addToCounter(key string, value int64) {
  store.lock.Lock()
  defer store.lock.Unlock()
  store.counters[key] = store.counters[key] + value
}

func (store *Store) addToStore(destination map[string]*Distribution, key string, value int64) {
  store.lock.Lock()
  defer store.lock.Unlock()
  distribution, exists := destination[key]
  if !exists {
    distribution = NewDistribution()
    destination[key] = distribution
  }
  distribution.addSample(value)
}

I've benchmarked the GO and Java implementations (go-benchmark-gist, java-benchmark-gist) and Java wins by far, but I don't understand why:

Go Results:
10 threads with 20000 items took 133 millis
100 threads with 20000 items took 1809 millis
1000 threads with 20000 items took 17576 millis
10 threads with 200000 items took 1228 millis
100 threads with 200000 items took 17900 millis

Java Results:
10 threads with 20000 items takes 89 millis
100 threads with 20000 items takes 265 millis
1000 threads with 20000 items takes 2888 millis  
10 threads with 200000 items takes 311 millis
100 threads with 200000 items takes 3067 millis

I've profiled the program with the Go's pprof and generated a call graph call-graph. This shows that it basically spends all the time in sync.(*Mutex).Lock() and sync.(*Mutex).Unlock().

The Top20 calls according to the profiler:

(pprof) top20
59110ms of 73890ms total (80.00%)
Dropped 22 nodes (cum <= 369.45ms)
Showing top 20 nodes out of 65 (cum >= 50220ms)
      flat  flat%   sum%        cum   cum%
    8900ms 12.04% 12.04%     8900ms 12.04%  runtime.futex
    7270ms  9.84% 21.88%     7270ms  9.84%  runtime/internal/atomic.Xchg
    7020ms  9.50% 31.38%     7020ms  9.50%  runtime.procyield
    4560ms  6.17% 37.56%     4560ms  6.17%  sync/atomic.CompareAndSwapUint32
    4400ms  5.95% 43.51%     4400ms  5.95%  runtime/internal/atomic.Xadd
    4210ms  5.70% 49.21%    22040ms 29.83%  runtime.lock
    3650ms  4.94% 54.15%     3650ms  4.94%  runtime/internal/atomic.Cas
    3260ms  4.41% 58.56%     3260ms  4.41%  runtime/internal/atomic.Load
    2220ms  3.00% 61.56%    22810ms 30.87%  sync.(*Mutex).Lock
    1870ms  2.53% 64.10%     1870ms  2.53%  runtime.osyield
    1540ms  2.08% 66.18%    16740ms 22.66%  runtime.findrunnable
    1430ms  1.94% 68.11%     1430ms  1.94%  runtime.freedefer
    1400ms  1.89% 70.01%     1400ms  1.89%  sync/atomic.AddUint32
    1250ms  1.69% 71.70%     1250ms  1.69%  github.com/toefel18/go-patan/statistics/lockbased.(*Distribution).addSample
    1240ms  1.68% 73.38%     3140ms  4.25%  runtime.deferreturn
    1070ms  1.45% 74.83%     6520ms  8.82%  runtime.systemstack
    1010ms  1.37% 76.19%     1010ms  1.37%  runtime.newdefer
    1000ms  1.35% 77.55%     1000ms  1.35%  runtime.mapaccess1_faststr
     950ms  1.29% 78.83%    15660ms 21.19%  runtime.semacquire
     860ms  1.16% 80.00%    50220ms 67.97%  main.Benchmrk.func1

Can someone explain why locking in Go seems to be so much slower than in Java, what am I doing wrong? I've also written a channel based implementation in Go, but that is even slower.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
donglanzhan7151 2016-10-03 20:41
关注
I've also posted this question on the golang-nuts group. The reply from Jesper Louis Andersen explains quite well that Java uses synchronization optimization techniques such as lock escape analysis/lock elision and lock coarsening.

Java JIT might be taking the lock and allowing multiple updates at once within the lock to increase performance. I ran the Java benchmark with -Djava.compiler=NONE which gave dramatic performance, but is not a fair comparison.

I assume that many of these optimization techniques have less impact in a production environment.

本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

报告相同问题？

关注问题

为什么Locking Go比Java慢得多？ Mutex.Lock（）Mutex.Unlock（）花费了大量时间
2016-10-02 09:44

回答 2 已采纳 I've also posted this question on the golang-nuts group. The reply from Jesper Louis Andersen exp
在golang中仔细检查了锁定-为什么需要mutex.RLock（）？
2019-01-08 13:34

回答 2 已采纳 If you don't acquire a RLock to read syncProducer, it's a data race, since another goroutine may u
从接收方关闭通道：从多个goroutine访问sync.Mutex时出现死锁
2018-04-01 08:43

回答 2 已采纳 You can try as hard as you like: you have to close the channel from sender side. You might be abl
mutex_init() / mutex_lock() / mutex_unlock()
2018-09-05 13:21

水木无痕的博客请求 1). 初始化互斥体 -- mutex_init()； 2). 获得互斥体 -- mutex_lock()； 3). 释放互斥体 -- mutex_unlock()；...1.mutex_init(), 注意...__mutex_init(struct mutex *lock, const char *name, struct lock_class...
隐藏结构字段并使其同步字段的访问和修改的最佳方法是什么？ json
2018-09-27 09:05

回答 1 已采纳 You don't have to add a mutex to the values you marshal, that's pointless. But you do need to use
SurfaceHolder.lockCanvas返回为null，如何解决 android
2022-05-02 14:13

回答 1 已采纳你把报错代码发过来看一下
如何正确使用sync.Cond？
2016-04-26 06:43

回答 7 已采纳 I finally discovered a way to do this and it doesn't involve sync.Cond at all - just the mutex. t
Mutex的lock(), unlock(), tryLock()函数介绍
2012-05-15 11:06

ameyume的博客 lock函数和tryLock函数都是用于锁定对象，但他们之间有一定的区别： lock函数是阻塞的，因为它调用WaitForSingleObject函数时传递的第二个参数是INFINITE，表示无限等待下去，所以是阻塞的。 tryLock函数时非阻塞...
同时调用`sync.Cond`的`Wait（）`方法安全吗？
2015-11-27 09:57

回答 1 已采纳 Yes it is safe to call Wait even when it calls L.Unlock() first but it is essential that you acqui
如何通过锁定实现在Go中映射的线程安全包装器？
2014-05-10 09:40

回答 1 已采纳 Effective Go Pointers vs. Values The rule about pointers vs. values for receivers is
Golang，如何分享价值-消息还是互斥体？
2014-10-23 05:00

回答 3 已采纳 if I consider performance only, are there any reason to use channel instead of mutex? Not re
Go-Mutex
2022-02-28 15:22

huycheaven的博客 Mutex 和信道来处理竞态条件多个协程访问同一个变量导致竞态条件，都在改变变量的值如果在任意时刻只允许一个 Go 协程访问临界区，那么就可以避免竞态条件。而使用 Mutex 也可以避免竞态条件
解决Go中重复的互斥锁
2015-10-22 16:21

回答 1 已采纳 It seems like a design flaw of your system. You should factor out the part that you need both lock
mutex互斥锁 - linux内核锁（四）
2022-11-16 18:41

生活需要深度的博客 //执行kthread_stop函数后，也会调度该线程，此时...版权声明：本文为CSDN博主「千册」的原创文章，遵循CC 4.0 BY-SA版权协议，转载请附上原文出处链接及本声明。上锁失败返回0，成功返回1，本次实验中，循环判断上锁。
glibc nptl库pthread_mutex_lock和pthread_mutex_unlock浅析
2019-03-16 23:46

lcjmsr的博客 futex全称是fast user-space locking，也就是快速用户空间锁,在linux下使用C语言写多线程程序时，在需要线程同步的地方会经常使用pthread_mutex_lock()函数对临界区进行加锁，如果加锁失败线程就会挂起，这就是互斥...
函数 mutex_init() / mutex_lock() / mutex_unlock()
2016-09-29 14:50

jgw2008的博客 1. 初始化互斥体 -- mutex_init()； 2. 获得互斥体 -- mutex_lock()； 3. 释放互斥体 -- mutex_unlock()； mutex不能使用在中断的上下文中。 1. mutex_init(), 注意mutex使用之前都需要先init /**...
std::mutex::unlock
2020-02-23 16:49

Yongqiang Cheng的博客 std::mutex::unlock Defined in header <mutex> - 定义于头文件 <mutex> public member function - 公开成员函数 mutex：n. 互斥，互斥元，互斥体，互斥量 synchronization [ˌsɪŋkrənaɪˈzeɪʃn]...
java lock await,Java ReentrantLock.unlock / await（）/ signal（）不会抛出IllegalMonitorStateException...
2021-02-16 05:59

伦斯特的博客 Even though my consumer Thread is not holding the lock, the program is not throwing IllegalMonitorStateException for any of the lock calls (unlock/await/signal).Update:private ...
std::mutex::lock
2020-02-23 12:00

Yongqiang Cheng的博客 std::mutex::lock Defined in header <mutex> - 定义于头文件 <mutex> public member function - 公开成员函数锁定互斥，若互斥不可用则阻塞。 mutex：n. 互斥，互斥元，互斥体，互斥量 ...
【Window】互斥锁——Mutex，lock_guard，unique_lock
2021-05-16 09:40

不脱发的码农~~~~的博客 1 互斥锁Mutex ...为了保护共享资源，在线程里也有这么一把锁——互斥锁（mutex），互斥锁是一种简单的加锁的方法来控制对共享资源的访问，互斥锁只有两种状态,即上锁( lock )和解锁( unlock )。在多线程
没有解决我的问题, 去提问

悬赏问题

¥15 素材场景中光线烘焙后灯光失效
¥15 请教一下各位，为什么我这个没有实现模拟点击
¥15 执行 virtuoso 命令后，界面没有，cadence 启动不起来
¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
¥20 有关区间dp的问题求解
¥15 多电路系统共用电源的串扰问题
¥15 slam rangenet++配置
¥15 有没有研究水声通信方面的帮我改俩matlab代码
¥15 ubuntu子系统密码忘记
¥15 保护模式-系统加载-段寄存器

为什么Locking Go比Java慢得多？ Mutex.Lock（）Mutex.Unlock（）花费了大量时间

2条回答 默认 最新

悬赏问题

2条回答默认最新