I wrote a tcp handler in golang, about 300 connections per second. There was no problem with the program just released to production. But after running for about 10 days, I see that the cpu usage is up to 100%. I used the golang tool "go tool pprof" to get the information of cpu usage :
File: gateway-w
Type: cpu
Time: Nov 7, 2018 at 5:38pm (CST)
Duration: 30.14s, Total samples = 30.13s ( 100%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 27.42s, 91.01% of 30.13s total
Dropped 95 nodes (cum <= 0.15s)
Showing top 10 nodes out of 28
flat flat% sum% cum cum%
24.69s 81.94% 81.94% 24.69s 81.94% runtime._ExternalCode /usr/local/go/src/runtime/proc.go
0.57s 1.89% 83.84% 0.57s 1.89% runtime.lock /usr/local/go/src/runtime/lock_futex.go
0.56s 1.86% 85.70% 0.56s 1.86% runtime.unlock /usr/local/go/src/runtime/lock_futex.go
0.26s 0.86% 86.56% 5.37s 17.82% gateway-w/connect/connect-tcp.tcpStartSession /go/src/gateway-w/connect/connect-tcp/tcp_framework.go
0.25s 0.83% 87.39% 1.67s 5.54% net.(*conn).Read /usr/local/go/src/net/net.go
0.24s 0.8% 88.18% 1.41s 4.68% net.(*netFD).Read /usr/local/go/src/net/fd_unix.go
0.23s 0.76% 88.95% 0.23s 0.76% runtime.nanotime /usr/local/go/src/runtime/sys_linux_amd64.s
0.22s 0.73% 89.68% 0.22s 0.73% internal/poll.(*fdMutex).incref /usr/local/go/src/internal/poll/fd_mutex.go
0.21s 0.7% 90.38% 0.21s 0.7% internal/poll.(*fdMutex).rwunlock /usr/local/go/src/internal/poll/fd_mutex.go
0.19s 0.63% 91.01% 0.19s 0.63% internal/poll.(*fdMutex).rwlock /usr/local/go/src/internal/poll/fd_mutex.go
my tcpHandle code is like this:
func tcpStartSession(conn net.Conn) {
defer closeTcp(conn)
var (last, n int
err error
buff []byte
)
last, n, err, buff =
0, 0, nil,
make([]byte, MAX_PACKET_LEN)
for {
// set read timeout
conn.SetReadDeadline(time.Now().Add(time.Duration(tcpTimeOutSec) * time.Second))
n, err = conn.Read(buff[last:])
if err != nil {
log.Info("tcp read error maybe timeout , ", err)
break
}
if n == 0 {
log.Debug("empty packet, continue")
continue
}
log.Debug("read bytes ", n)
log.Info("get a raw package:", hex.EncodeToString(buff[:last+n]))
last += n
...
for {
if last == 0 {
break
}
ret, err := protoHandle.IsWhole(buff[:last])
if err != nil {
log.Warn("proto handle check iswhole error", err)
}
log.Debug("rest buffer len = %d
", ret)
if ret < 0 {
//wait for more tcp fragment.
break
}
packetLen := last - ret
packetBuf := make([]byte, packetLen)
copy(packetBuf, buff[:packetLen])
last = ret
if last > 0 {
copy(buff, buff[packetLen:packetLen+last])
}
...
}
}
}
I can't understand what runtime._ExternalCode means. This is the function inside golang.
my golang version is :go version go1.9.2 linux/amd64
my program is running on docker
my docker version is : 1.12.6
I hope someone can help me. Thank you very much!
I tried to upgrade the golang version to 1.10.3. After running for more than half a year, there was no problem. Recently, the same problem occurred, but I have not changed the program code. I suspect that there is a problem with this code:
conn.SetReadDeadline(time.Now().Add(time.Duration(tcpTimeOutSec) * time.Second))
Need your help, thank you.