dongwuxie7976 2019-02-26 14:15

# 具有多个等待组的管道中通道范围内的死锁

I'm practicing a challenge to calculate factorials by splitting calculations into 100 groups concurrently, I solved lots of issue on WaitGroups, but still in the `calculateFactorial` function I got the deadlock on range over channel part. Wish someone could point the issue here, thank you.

``````package main

import (
"fmt"
"sync"
)

func main() {
var wg sync.WaitGroup
in := make (chan int)
out := make (chan float64)

out = calculateFactorial(genConcurrentGroup(in, &wg), &wg)

go func() {
in <- 10
close(in)
}()

fmt.Println(<-out)

wg.Wait()

}

//split input number into groups
//the result should be a map of [start number, number in group]
//this is not heavy task so run in one go routine
func genConcurrentGroup(c chan int, wg *sync.WaitGroup) chan map[int]int{
out := make(chan map[int]int)

go func() {
//100 groups
total:= <- c
wg.Done()
//element number in group
elemNumber := total / 100
extra := total % 100
result := make(map[int]int)
if elemNumber>0{
//certain 100 groups
for i:=1 ;i<=99;i++{
result[(i-1) * elemNumber + 1] = elemNumber
}
result[100] = extra + elemNumber
}else{
//less than 100
for i:=1;i<=total;i++{
result[i] = 1
}
}

out <- result
close(out)
}()
return out
}

//takes in all numbers to calculate multiply result
//this could be heavy so can do it 100 groups together
func calculateFactorial(nums chan map[int]int, wg *sync.WaitGroup) chan float64{
out := make(chan float64)

go func() {
total:= <- nums
wg.Done()
fmt.Println(total)

oneResult := make(chan float64)

var wg2 sync.WaitGroup

for k,v := range total{
fmt.Printf("%d %d
",k,v)
go func(k int, v int) {
t := 1.0
for i:=0;i<v;i++{
t = t * (float64(k) + float64(i))
}
fmt.Println(t)
oneResult <- t
wg2.Done()
}(k,v)
}

wg2.Wait()
close(oneResult)

result := 1.0
for n := range oneResult{  //DEADLOCK HERE! Why?
result *= n
}

fmt.Printf("Result: %f
",result)

out <- result

}()
return out
}
``````

Update:

Thanks to Jessé Catrinck's answer which fixed the issue in the above code by simply change the `oneResult` to a buffered channel. However in https://stackoverflow.com/a/15144455/921082 there's a quote

You should never add buffering merely to fix a deadlock. If your program deadlocks, it's far easier to fix by starting with zero buffering and think through the dependencies. Then add buffering when you know it won't deadlock.

So could anyone please help me figure out how to not to use buffered channel for this? Is it possible?

Furthermore, I did some research on what exactly causes a deadlock.

Some quote like from https://stackoverflow.com/a/18660709/921082,

If the channel is unbuffered, the sender blocks until the receiver has received the value. If the channel has a buffer, the sender blocks only until the value has been copied to the buffer; if the buffer is full, this means waiting until some receiver has retrieved a value.

Said otherwise :

1. when a channel is full, the sender waits for another goroutine to make some room by receiving

2. you can see an unbuffered channel as an always full one : there must be another goroutine to take what the sender sends.

So in my original situation, what is probably causing the deadlock is maybe :

1. the range over channel is not receiving ?

2. the range over channel is not receiving on a separated go routine. ?

3. the `oneResult` is not properly closed, so range over channel doesn't know where's the end?

for number 3, I don't know if there's anything wrong about closing the `oneResult` before range over, since this pattern appears on many examples on the internet. If it is number 3, could it be something wrong in the wait group?

I got another article very similar to my situation https://robertbasic.com/blog/buffered-vs-unbuffered-channels-in-golang/, in its second lesson learned, he uses a `for { select {} }` infinite loop as an alternative to range over, it seems solved his problem.

`````` go func() {
for{
select {
case p := <-pch:
findcp(p)
}
}
}()
``````

Lesson number 2 — an unbuffered channel can’t hold on to values (yah, it’s right there in the name “unbuffered”), so whatever is sent to that channel, it must be received by some other code right away. That receiving code must be in a different goroutine because one goroutine can’t do two things at the same time: it can’t send and receive; it must be one or the other.

Thanks

• 写回答

#### 2条回答默认 最新

• dongmie3526 2019-02-26 16:40
关注

The deadlock isn't on the range-over-channel loop. If you run the code on playground you'll see at the top of the stacktrace that the error is caused by `wg2.Wait` (line 88 on playground and pointed to by the stacktrace). Also in the stacktrace you can see all the goroutines that haven't finished because of the deadlock, this is because `oneResult<-t` never completes, so none of the goroutines started in the loop ever finish.

So the main problem is here:

``````wg2.Wait()
close(oneResult)

// ...

for n := range oneResult{
// ...
``````

Also looping over a closed channel is not what you want, I assume. However even if you didn't close the channel, that loop would never start because `wg2.Wait()` will wait until its done.

``````oneResult <- t
wg2.Done()
``````

But it will never be done because it relies on the loop to be already running. The line `oneResult <- t` will not complete unless there's someone on the other side receiving from that channel, which is your loop, however that range-over-channel loop is still waiting for `wg2.Wait()` to complete.

So essentially you have a "circular dependency" between the channel's sender and receiver.

To fix the issue you need to allow the loop to start receiving from the channel while still making sure that channel's closed when done. You can do thing by wrapping the two wait-and-close lines into their own goroutine.

https://play.golang.com/p/rwwCFVszZ6Q

本回答被题主选为最佳回答 , 对您是否有帮助呢?
评论

#### 悬赏问题

• ¥20 qtcreat 使用msvc编译器开发软件运行时字体锯齿感严重
• ¥15 c#直接使用C++ 写的class 后续
• ¥15 为何显示keyerror fruit
• ¥15 关于#stm32#的问题：/* User can add his own implementation to report the HAL error return state */
• ¥15 imageware粗糙度表面
• ¥15 python使用pulp线性优化时报错
• ¥15 为什么我的uibot导入py模块出错呀。py文件放在了uibot里对应的python文件夹了，卸了重安也不行
• ¥15 开源或低价数据中台哪个最好
• ¥15 arduino编程出现字符串疑似覆盖现象
• ¥15 我的b站在没有碰到屏幕的情况下偶尔会自动跳出进度条，就像在屏幕上点了一下一样，但我并没有点。而且视频进度并没有变。这可能是什么原因造成的？