I have some code that is a job dispatcher and is collating a large amount of data from lots of TCP sockets. This code is a result of an approach to Large number of transient objects - avoiding contention and it largely works with CPU usage down a huge amount and locking not an issue now either.
From time to time my application locks up and the "Channel length" log is the only thing that keeps repeating as data is still coming in from my sockets. However the count remains at 5000 and no downstream processing is taking place.
I think the issue might be a race condition and the line it is possibly getting hung up on is channel <- msg
within the select
of the jobDispatcher
. Trouble is I can't work out how to verify this.
I suspect that as select can take items at random the goroutine is returning and the shutdownChan doesn't have a chance to process. Then data hits inboundFromTCP and it blocks!
Someone might spot something really obviously wrong here. And offer a solution hopefully!?
var MessageQueue = make(chan *trackingPacket_v1, 5000)
func init() {
go jobDispatcher(MessageQueue)
}
func addMessage(trackingPacket *trackingPacket_v1) {
// Send the packet to the buffered queue!
log.Println("Channel length:", len(MessageQueue))
MessageQueue <- trackingPacket
}
func jobDispatcher(inboundFromTCP chan *trackingPacket_v1) {
var channelMap = make(map[string]chan *trackingPacket_v1)
// Channel that listens for the strings that want to exit
shutdownChan := make(chan string)
for {
select {
case msg := <-inboundFromTCP:
log.Println("Got packet", msg.Avr)
channel, ok := channelMap[msg.Avr]
if !ok {
packetChan := make(chan *trackingPacket_v1)
channelMap[msg.Avr] = packetChan
go processPackets(packetChan, shutdownChan, msg.Avr)
packetChan <- msg
continue
}
channel <- msg
case shutdownString := <-shutdownChan:
log.Println("Shutting down:", shutdownString)
channel, ok := channelMap[shutdownString]
if ok {
delete(channelMap, shutdownString)
close(channel)
}
}
}
}
func processPackets(ch chan *trackingPacket_v1, shutdown chan string, id string) {
var messages = []*trackingPacket_v1{}
tickChan := time.NewTicker(time.Second * 1)
defer tickChan.Stop()
hasCheckedData := false
for {
select {
case msg := <-ch:
log.Println("Got a messages for", id)
messages = append(messages, msg)
hasCheckedData = false
case <-tickChan.C:
messages = cullChanMessages(messages)
if len(messages) == 0 {
messages = nil
shutdown <- id
return
}
// No point running checking when packets have not changed!!
if hasCheckedData == false {
processMLATCandidatesFromChan(messages)
hasCheckedData = true
}
case <-time.After(time.Duration(time.Second * 60)):
log.Println("This channel has been around for 60 seconds which is too much, kill it")
messages = nil
shutdown <- id
return
}
}
}
Update 01/20/16
I tried to rework with the channelMap
as a global with some mutex locking but it ended up deadlocking still.
Slightly tweaked the code, still locks but I don't see how this one does!! https://play.golang.org/p/PGpISU4XBJ
Update 01/21/17 After some recommendations I put this into a standalone working example so people can see. https://play.golang.org/p/88zT7hBLeD
It is a long running process so will need running locally on a machine as the playground kills it. Hopefully this will help get to the bottom of it!