I have a curious issue, which happens in production very infrequently.
I have a gobal http client to reuse connections.
DefaultClient = &http.Client{
Transport: &http.Transport{
Proxy: http.ProxyFromEnvironment,
DialContext: (&net.Dialer{
Timeout: 500 * time.Millisecond,
KeepAlive: 1 * time.Minute,
}).DialContext,
TLSHandshakeTimeout: 500 * time.Millisecond,
MaxIdleConns: 3000,
MaxIdleConnsPerHost: 3000,
IdleConnTimeout: 1 * time.Minute,
},
Timeout: 500 * time.Millisecond,
}
Every so often a machine will become unresponsive because it is unable to recover from a panic, which seems to be related to mutex locking and unlocking within the Go standard http lib.
goroutine 47888935 [semacquire, 7 minutes]:
sync.runtime_SemacquireMutex(0xc00040126c, 0x0)
/usr/local/go/src/runtime/sema.go:71 +0x3d
sync.(*Mutex).Lock(0xc000401268)
/usr/local/go/src/sync/mutex.go:134 +0xff
net/http.(*http2clientConnPool).getClientConn(0xc000401260, 0xc028295f00, 0xc0281de820, 0x12, 0xc00027a400, 0xc0283266a8, 0x6aafa0, 0xc00027a410)
/usr/local/go/src/net/http/h2_bundle.go:760 +0x65
net/http.http2noDialClientConnPool.GetClientConn(0xc000401260, 0xc028295f00, 0xc0281de820, 0x12, 0xc0281de820, 0x12, 0x0)
/usr/local/go/src/net/http/h2_bundle.go:954 +0x4e
net/http.(*http2Transport).RoundTripOpt(0xc00027a410, 0xc028295f00, 0x0, 0x200000000000000, 0x4137ec, 0xb940aa)
/usr/local/go/src/net/http/h2_bundle.go:7031 +0x105
net/http.(*http2Transport).RoundTrip(0xc00027a410, 0xc028295f00, 0x1, 0xc028307800, 0x0)
/usr/local/go/src/net/http/h2_bundle.go:6999 +0x3a
net/http.http2noDialH2RoundTripper.RoundTrip(0xc00027a410, 0xc028295f00, 0xc0282cf800, 0x5, 0xc00059c2c8)
/usr/local/go/src/net/http/h2_bundle.go:1019 +0x39
net/http.(*Transport).roundTrip(0xc000177680, 0xc028295f00, 0xc028267a70, 0xc02801ff88, 0xc02801ff90)
/usr/local/go/src/net/http/transport.go:415 +0xd4c
net/http.(*Transport).RoundTrip(0xc000177680, 0xc028295f00, 0xc000177680, 0xbf0c9388b63791ca, 0x2122772ff0fc)
/usr/local/go/src/net/http/roundtrip.go:17 +0x35
net/http.send(0xc028295e00, 0xc99600, 0xc000177680, 0xbf0c9388b63791ca, 0x2122772ff0fc, 0x11d03a0, 0xc027ef3480, 0xbf0c9388b63791ca, 0xc028326cb8, 0x1)
/usr/local/go/src/net/http/client.go:250 +0x14b
net/http.(*Client).send(0xc000400360, 0xc028295e00, 0xbf0c9388b63791ca, 0x2122772ff0fc, 0x11d03a0, 0xc027ef3480, 0x0, 0x1, 0xc000518000)
/usr/local/go/src/net/http/client.go:174 +0xfa
net/http.(*Client).do(0xc000400360, 0xc028295e00, 0x0, 0x0, 0x0)
/usr/local/go/src/net/http/client.go:641 +0x2a8
net/http.(*Client).Do(0xc000400360, 0xc028295e00, 0x10, 0xb933f5, 0x4)
/usr/local/go/src/net/http/client.go:509 +0x35
It happens so infrequently in high traffic, production that it's hard to reproduce.
Has anyone experienced this before or has any insights at the possible problem? It seems tightly coupled with the http client.