dpg76975 2019-03-26 08:36
浏览 406
已采纳

异常大量的TCP连接超时错误

I am using a Go TCP Client to connect to our Go TCP Server.

I am able to connect to the Server and run commands properly, but every so often there will be an unusually high amount of consecutive TCP connection errors reported by my TCP Client when trying to either connect to our TCP Server or sending a message once connected:

dial tcp kubernetes_node_ip:exposed_kubernetes_port:
connectex: A connection attempt failed because the connected party did not properly
respond after a period of time, or established connection failed because connected
host has failed to respond.

read tcp unfamiliar_ip:unfamiliar_port->kubernetes_node_ip:exposed_kubernetes_port
wsarecv: A connection attempt failed because the connected party did not properly
respond after a period of time, or established connection failed because connected
host has failed to respond.

I say "unusually high" because I assume that the number of times these errors occur should be very minimal (about 5 or less within the hour). Note that I am not dismissing the possibility of this being caused by connection instabilities, as I have also noticed that it is possible to run several commands in rapid succession without any errors.

However, I am still going to post my code in case I am doing something wrong.

Below is the code that my TCP Client uses to connect to our server:

serverAddress, err := net.ResolveTCPAddr("tcp", kubernetes_ip+":"+kubernetes_port)
if err != nil {     
    fmt.Println(err)
    return
}

// Never stop asking for commands from the user.
for {
    // Connect to the server.
    serverConnection, err := net.DialTCP("tcp", nil, serverAddress)
    if err != nil {         
        fmt.Println(err)
        continue
    }

    defer serverConnection.Close()

    // Added to prevent connection timeout errors, but doesn't seem to be helping
    // because said errors happen within just 1 or 2 minutes.
    err = serverConnection.SetDeadline(time.Now().Add(10 * time.Minute))
    if err != nil {         
        fmt.Println(err)
        continue
    }

    // Ask for a command from the user and convert to JSON bytes...

    // Send message to server.
    _, err = serverConnection.Write(clientMsgBytes)
    if err != nil {
        err = merry.Wrap(err)
        fmt.Println(merry.Details(err))
        continue
    }

    err = serverConnection.CloseWrite()
    if err != nil {
        err = merry.Wrap(err)
        fmt.Println(merry.Details(err))
        continue
    }

    // Wait for a response from the server and print...
}

Below is the code that our TCP Server uses to accept client requests:

// We only supply the port so the IP can be dynamically assigned:
serverAddress, err := net.ResolveTCPAddr("tcp", ":"+server_port)
if err != nil {     
    return err
}

tcpListener, err := net.ListenTCP("tcp", serverAddress)
if err != nil {     
    return err
}

defer tcpListener.Close()

// Never stop listening for client requests.
for {
    clientConnection, err := tcpListener.AcceptTCP()
    if err != nil {         
        fmt.Println(err)
        continue
    }

    go func() {
        // Add client connection to Job Queue.
        // Note that `clientConnections` is a buffered channel with a size of 1500.
        // Since I am the only user connecting to our server right now, I do not think
        // this is a channel blocking issue.
        clientConnections <- clientConnection
    }()
}

Below is the code that our TCP Server uses to process client requests:

defer clientConnection.Close()

// Added to prevent connection timeout errors, but doesn't seem to be helping
// because said errors happen within just 1 or 2 minutes.
err := clientConnection.SetDeadline(time.Now().Add(10 * time.Minute))
if err != nil {     
    return err
}

// Read full TCP message.
// Does not stop until an EOF is reported by `CloseWrite()`
clientMsgBytes, err := ioutil.ReadAll(clientConnection)
if err != nil {
    err = merry.Wrap(err)
    return nil, err
}

// Process the message bytes...

My questions are:

  1. Am I doing something wrong in the above code, or is the above decent enough for basic TCP Client-Server operations?

  2. Is it okay that both the TCP Client and TCP Server have code that defers closing their one connection?

  3. I seem to recall that calling defer inside a loop does nothing. How do I properly close Client connections before starting new ones?

Some extra information:

  • Said errors are not logged by the TCP Server, so aside from connection instabilities, this might also be a Kubernetes/Docker-related issue.
  • 写回答

1条回答 默认 最新

  • donglanzhan7151 2019-03-26 09:18
    关注

    It seems this piece of code does not act as you think it does. The defer statement on the connection close will only happen when the function returns, not when an iteration ends. So as far as I can see here, you are creating a lot of connections on the client side, it could be the problem.

    serverAddress, err := net.ResolveTCPAddr("tcp", kubernetes_ip+":"+kubernetes_port)
    if err != nil {     
        fmt.Println(err)
        return
    }
    
    // Never stop asking for commands from the user.
    for {
        // Connect to the server.
        serverConnection, err := net.DialTCP("tcp", nil, serverAddress)
        if err != nil {         
            fmt.Println(err)
            continue
        }
    
        defer serverConnection.Close()
    
        // Added to prevent connection timeout errors, but doesn't seem to be helping
        // because said errors happen within just 1 or 2 minutes.
        err = serverConnection.SetDeadline(time.Now().Add(10 * time.Minute))
        if err != nil {         
            fmt.Println(err)
            continue
        }
    
        // Ask for a command from the user and send to the server...
    
        // Wait for a response from the server and print...
    }
    

    I suggest to write it this way:

    func start() {
        serverAddress, err := net.ResolveTCPAddr("tcp", kubernetes_ip+":"+kubernetes_port)
        if err != nil {     
            fmt.Println(err)
            return
        }
        for {
            if err := listen(serverAddress); err != nil {
                fmt.Println(err)
            }
        }
    }
    
    func listen(serverAddress string) error {
         // Connect to the server.
         serverConnection, err := net.DialTCP("tcp", nil, serverAddress)
         if err != nil {         
             fmt.Println(err)
             continue
         }
    
        defer serverConnection.Close()
    
        // Never stop asking for commands from the user.
        for {
            // Added to prevent connection timeout errors, but doesn't seem to be helping
            // because said errors happen within just 1 or 2 minutes.
            err = serverConnection.SetDeadline(time.Now().Add(10 * time.Minute))
            if err != nil {         
               fmt.Println(err)
               return err
            }
    
            // Ask for a command from the user and send to the server...
    
            // Wait for a response from the server and print...
        }
    }
    

    Also, you should keep a single connection open, or a pool of connections, instead of opening and closing the connection right away. Then when you send a message you get a connection from the pool (or the single connection), and you write the message and wait for the response, then you release the connection to the pool.

    Something like that:

    res, err := c.Send([]byte(`my message`))
    if err != nil {
        // handle err
    }
    
    // the implementation of send
    func (c *Client) Send(msg []byte) ([]byte, error) {
        conn, err := c.pool.Get() // returns a connection from the pool or starts a new one
        if err != nil {
            return nil, err
        }
        // send your message and wait for response
        // ...
        return response, nil
    }
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥50 有数据,怎么建立模型求影响全要素生产率的因素
  • ¥50 有数据,怎么用matlab求全要素生产率
  • ¥15 TI的insta-spin例程
  • ¥15 完成下列问题完成下列问题
  • ¥15 C#算法问题, 不知道怎么处理这个数据的转换
  • ¥15 YoloV5 第三方库的版本对照问题
  • ¥15 请完成下列相关问题!
  • ¥15 drone 推送镜像时候 purge: true 推送完毕后没有删除对应的镜像,手动拷贝到服务器执行结果正确在样才能让指令自动执行成功删除对应镜像,如何解决?
  • ¥15 求daily translation(DT)偏差订正方法的代码
  • ¥15 js调用html页面需要隐藏某个按钮