I have socks5 proxy server written on golang. The daemon is listening 10000 ports from 15000 to 25000 and so this is a proxy list. Recently we started to test it with some clients and we end up with about 500 rps on 5000 of these ports. This is not as much I think, but I immediately point bunch of problems.
The server is Ubuntu 18, 8 cores, 32G RAM, 1Gb network. I observe almost 800% CPU all the time and constantly rising number of CLOSE_WAIT and TIME_WAIT socket states. I investigate the code carefully about a week, but not point any problem, all connections are closing everywhere.
pprof is saying that this is all about system calls, more precisely socket read. ReadAtLeast here is reading first 3 bytes of socks5 request to determine the request type.
func (s *Server) Serve(conn net.Conn) error {
defer conn.Close() // seems doesn't work too
_ = conn.SetDeadline(time.Now().Add(time.Second * 30)) // doesn't work
request, err := NewRequest(conn)
if err != nil {
return err
}
// Process the client request
return s.handleRequest(request, conn)
}
func NewRequest(bufConn io.Reader) (*Request, error) {
header := []byte{0, 0, 0}
if _, err := io.ReadAtLeast(bufConn, header, 3); err != nil {
return nil, fmt.Errorf("Failed to get command version: %v", err)
}
// ...
}
net.ipv4.tcp_fin_timeout=25 so 2MSL is 50 seconds, but seems the sockets just don't have enough time to close because new ones coming in to fast. This is about TIME_WAIT. What wrong with CLOSE_WAIT I have no idea. I definitely close the connection, but seems not getting FIN_ACK from the client.
As a temporary solution I put the restart command to the crontab each 15 minutes, so all CLOSE_WAIT connections are dropped and TIME_WAIT decreased a little, but this is a downtime etc.