I have tried to implement a graceful shutdown of the go server, as described in this blog post http://grisha.org/blog/2014/06/03/graceful-restart-in-golang/. The main bits are the following.
Custom listener:
var httpWg sync.WaitGroup // initialised in the other part
type gracefulListener struct {
net.Listener
stop chan error
stopped bool
}
func newGracefulListener(l net.Listener) (gl *gracefulListener) {
gl = &gracefulListener{Listener: l, stop: make(chan error)}
go func() {
_ = <-gl.stop
gl.stopped = true
gl.stop <- gl.Listener.Close()
}()
return
}
func (gl *gracefulListener) Accept() (c net.Conn, err error) {
c, err = gl.Listener.Accept()
if err != nil {
return
}
c = gracefulConn{Conn: c} // wrap using our custom connection
httpWg.Add(1) // increase the counter
return
}
func (gl *gracefulListener) Close() error {
if gl.stopped {
return syscall.EINVAL
}
gl.stop <- nil
return <-gl.stop
}
func (gl *gracefulListener) File() *os.File {
tl := gl.Listener.(*net.TCPListener)
fl, _ := tl.File()
return fl
}
Custom Conn:
type gracefulConn struct {
net.Conn
}
func (w gracefulConn) Close() error {
httpWg.Done() // <- panics sometimes
return w.Conn.Close()
}
The idea is when the program receives SIGTERM, it stops serving new connections and just waits for the httpWg.Wait()
for existing connections to finish.
This approach works locally, but when I deploy it, sometimes I receive a panic in the gracefulConn.Close()
at httpWg.Done()
line:
panic: sync: negative WaitGroup counter
The panic happens not when I stop the server but just during routine serving.
How is it possible, that there are more Close()
calls then Accept()
calls? Or am I missing something?
P.S. I have tried to add stopped
property and a mutex to gracefullConn
, so in Close
it locks the mutex and checks stopped
to ensure we stop it only once. However, I still received the same panic.