带有缓冲通道的死锁

I have some code that is a job dispatcher and is collating a large amount of data from lots of TCP sockets. This code is a result of an approach to Large number of transient objects - avoiding contention and it largely works with CPU usage down a huge amount and locking not an issue now either.

From time to time my application locks up and the "Channel length" log is the only thing that keeps repeating as data is still coming in from my sockets. However the count remains at 5000 and no downstream processing is taking place.

I think the issue might be a race condition and the line it is possibly getting hung up on is channel <- msg within the select of the jobDispatcher. Trouble is I can't work out how to verify this.

I suspect that as select can take items at random the goroutine is returning and the shutdownChan doesn't have a chance to process. Then data hits inboundFromTCP and it blocks!

Someone might spot something really obviously wrong here. And offer a solution hopefully!?

var MessageQueue = make(chan *trackingPacket_v1, 5000)

func init() {
    go jobDispatcher(MessageQueue)
}

func addMessage(trackingPacket *trackingPacket_v1) {
    // Send the packet to the buffered queue!
    log.Println("Channel length:", len(MessageQueue))
    MessageQueue <- trackingPacket
}

func jobDispatcher(inboundFromTCP chan *trackingPacket_v1) {
    var channelMap = make(map[string]chan *trackingPacket_v1)

    // Channel that listens for the strings that want to exit
    shutdownChan := make(chan string)

    for {
        select {
        case msg := <-inboundFromTCP:
            log.Println("Got packet", msg.Avr)
            channel, ok := channelMap[msg.Avr]
            if !ok {
                packetChan := make(chan *trackingPacket_v1)

                channelMap[msg.Avr] = packetChan
                go processPackets(packetChan, shutdownChan, msg.Avr)
                packetChan <- msg
                continue
            }
            channel <- msg
        case shutdownString := <-shutdownChan:
            log.Println("Shutting down:", shutdownString)
            channel, ok := channelMap[shutdownString]
            if ok {
                delete(channelMap, shutdownString)
                close(channel)
            }
        }
    }
}

func processPackets(ch chan *trackingPacket_v1, shutdown chan string, id string) {
    var messages = []*trackingPacket_v1{}

    tickChan := time.NewTicker(time.Second * 1)
    defer tickChan.Stop()

    hasCheckedData := false

    for {
        select {
        case msg := <-ch:
            log.Println("Got a messages for", id)
            messages = append(messages, msg)
            hasCheckedData = false
        case <-tickChan.C:

            messages = cullChanMessages(messages)
            if len(messages) == 0 {
                messages = nil
                shutdown <- id
                return
            }

            // No point running checking when packets have not changed!!
            if hasCheckedData == false {
                processMLATCandidatesFromChan(messages)
                hasCheckedData = true
            }
        case <-time.After(time.Duration(time.Second * 60)):
            log.Println("This channel has been around for 60 seconds which is too much, kill it")
            messages = nil
            shutdown <- id
            return
        }
    }
}

Update 01/20/16

I tried to rework with the channelMap as a global with some mutex locking but it ended up deadlocking still.

Slightly tweaked the code, still locks but I don't see how this one does!! https://play.golang.org/p/PGpISU4XBJ

Update 01/21/17 After some recommendations I put this into a standalone working example so people can see. https://play.golang.org/p/88zT7hBLeD

It is a long running process so will need running locally on a machine as the playground kills it. Hopefully this will help get to the bottom of it!

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
duanbei3747 2017-01-21 00:37
关注
I'm guessing that your problem is getting stuck doing this channel <- msg at the same time as the other goroutine is doing shutdown <- id.

Since neither the channel nor the shutdown channels are buffered, they block waiting for a receiver. And they can deadlock waiting for the other side to become available.

There are a couple of ways to fix it. You could declare both of those channels with a buffer of 1.

Or instead of signalling by sending a shutdown message, you could do what Google's context package does and send a shutdown signal by closing the shutdown channel. Look at https://golang.org/pkg/context/ especially WithCancel, WithDeadline and the Done functions.

You might be able to use context to remove your own shutdown channel and timeout code.

And JimB has a point about shutting down the goroutine while it might still be receiving on the channel. What you should do is send the shutdown message (or close, or cancel the context) and continue to process messages until your ch channel is closed (detect that with case msg, ok := <-ch:), which would happen after the shutdown is received by the sender.

That way you get all of the messages that were incoming until the shutdown actually happened, and should avoid a second deadlock.

本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

报告相同问题？

关注问题

带有缓冲通道的死锁
2017-01-20 13:48

回答 2 已采纳 I'm guessing that your problem is getting stuck doing this channel <- msg at the same time as t
去通道缓冲和死锁
2016-03-05 08:30

回答 1 已采纳 You get a panic, when the Go runtime discovers that all goroutines are in a deadlock: waiting for
使用带有select的通道时的Goroutine死锁
2018-12-03 16:12

回答 1 已采纳 It looks like your logic to exit the Manager func is now different. Before you waited for the chan
go 信道chan有缓冲通道跟无缓冲通道区别:
2021-06-23 17:55

开心码农1号的博客 2:有缓冲的通道: 一种在被接收前能存储一个或多个值的通道。并不强制goroutine直接必须同时完成发送和接收，只有通道中没有要接收的值时，接收动作才会阻塞，只有在通道没有可用的缓冲区容纳被发送的值时，发送动作...
双通道死锁
2017-05-07 20:14

回答 2 已采纳 You need to pump the urls in a goroutine, otherwise the outCh will fill up which as you aren't emp
进入例行程序，出现通道死锁
2019-09-13 01:21

回答 1 已采纳 range ch reads from the channel until it is closed. How many times do you call close(ch)? When w
golang通道与make相关的死锁
2017-07-28 07:05

回答 1 已采纳 You can create two types of channels: buffered channels and unbuffered channels. Buffered channel
26. Go 语言中通道死锁经典错误案例详解
2020-06-03 08:10

写代码的明哥的博客 Hi，大家好，我是明哥。在自己学习 Golang 的这段时间里，我写了详细的学习笔记放在我的个人微信公众号《Go编程时光》，对于...刚接触 Go 语言的信道的时候，经常会遇到死锁的错误，而导致这个错误的原因有很多种，这
Goroutine存储通道值没有死锁
2019-03-15 03:11

回答 1 已采纳 The unbuffered channel needs two end points to work, so let's start with correct example: package
通道在workerpool上的死锁
2017-05-29 14:26

回答 4 已采纳 The problem is that your channels are filling up. The main() routine tries to put all jobs into th
范围内的通道完成死锁
2017-07-10 19:52

回答 2 已采纳 Range only stops when the channel is closed. You're hitting a deadlock because nothing is writing
golang-channel造成死锁案例
2022-03-09 21:00

Steps-of-time的博客 1.案例1,无缓冲信道导致死锁, -> 运行死锁 // 不设定容量创建的是无缓冲信道,在接收者未准备好之前信道处于阻塞状态, package main func main(){ // 例如本案例:fatal error: all goroutines are asleep - ...
单通道执行常规死锁
2015-10-09 10:20

回答 1 已采纳 One possible solution is to avoid select statement and use separate goroutine for receiver (or sen
Part23:有缓冲的通道和工作池
2019-05-07 14:51

cyberspecter的博客目录什么是带缓冲区的通道？例子另一个例子死锁长度 vs 容量WaitGroup工作池实现欢迎来到Golang教程系列的第23节什么是带缓冲区的通道？我们在上一节所讨论的所有...类似地，从有缓冲区通道上读取数据，仅当缓...
计算机操作系统课件：第4章进程同步与通信-死锁03.ppt
2022-06-02 22:14

关于死锁的一些结论是：参与死锁的进程最少是两个，参与死锁的进程至少有两个已经占有资源，参与死锁的所有进程都在等待资源，参与死锁的进程是当前系统中所有进程的子集。产生死锁的原因有两个：竞争系统资源和...
没有解决我的问题, 去提问

悬赏问题

¥15 metadata提取的PDF元数据，如何转换为一个Excel
¥15 关于arduino编程toCharArray()函数的使用
¥100 vc++混合CEF采用CLR方式编译报错
¥15 coze 的插件输入飞书多维表格 app_token 后一直显示错误，如何解决？
¥15 vite+vue3+plyr播放本地public文件夹下视频无法加载
¥15 c#逐行读取txt文本，但是每一行里面数据之间空格数量不同
¥50 如何openEuler 22.03上安装配置drbd
¥20 ING91680C BLE5.3 芯片怎么实现串口收发数据
¥15 无线连接树莓派，无法执行update，如何解决？（相关搜索：软件下载）
¥15 Windows11, backspace, enter, space键失灵

带有缓冲通道的死锁

2条回答 默认 最新

悬赏问题

2条回答默认最新