使用Go标准库,为什么在这种两层体系结构中不断泄漏TCP连接?

In this situation, I'm using all standard Go libraries -- net/http, most importantly.

The application consists of two layers. The first layer is the basic web application. The web application serves out the UI, and proxies a bunch of API calls back to the second layer based on username -- so, it's effectively a load balancer with consistent hashing -- each user is allocated to one of these second-layer nodes, and any requests pertaining to that user must be sent to that particular node.

Quick details

These API endpoints in the first layer effectively read in a JSON body, check the username, use that to figure out which of the layer 2 nodes to send the JSON body to, and then it sends it there. This is done using a global http.Client that has timeouts set on it, as appropriate.

The server side does a defer request.Body.Close() in each of the handlers after ensuring no error comes back from decoder.Decode(&obj) calls that unmarshal the JSON. If there is any codepath where that could happen, it isn't one that's likely to get followed very often.

Symptoms

On the node in the second layer (the application server) I get log lines like this because it's leaking sockets presumably and sucking up all the FDs:

2019/07/15 16:16:59 http: Accept error: accept tcp [::]:8100: accept4: too many open files; retrying in 1s
2019/07/15 16:17:00 http: Accept error: accept tcp [::]:8100: accept4: too many open files; retrying in 1s

And, when I do lsof 14k lines are output, of which 11,200 are TCP sockets. When I look into the contents of lsof, I see that nearly all these TCP sockets are in connection state CLOSE_WAIT, and are between my application server (second layer node) and the web server (the first layer node).

Interestingly, nothing seems to go wrong with the web application server (layer 1) during this timeframe.

Why does this happen?

I've seen lots of explanations, but most either point out that you need to specify custom defaults on a custom http.Client and not use the default, or they tell you to make sure to close the request bodies after reading from them in the layer 2 handlers.

Given all this information, does anyone have any idea what I can do to at least put this to bed once and for all? Everything I search on the internet is user error, and while I certainly hope that's the case here, I worry that I've nailed down every last quirk of the Go standard library I can find.

Been having trouble nailing down exactly how long it takes for this to happen -- the last time it happened, it was up for 3 days before I started to see this error, and at that point obviously nothing recovers until I kill and restart the process.

Any help would be hugely appreciated!

EDIT: example of client-side code

Here is an example of what I'm doing in the web application (layer 1) to call the layer 2 node:


var webHttpClient = &http.Client{
    Transport: &http.Transport{
        MaxIdleConnsPerHost: MaxIdleConnections,
    },
    Timeout: time.Second * 20,
}
// ...
                    uri := fmt.Sprintf("http://%s/%s", tsUri, "pms/all-venue-balances")
                    req, e := http.NewRequest("POST", uri, bytes.NewBuffer(b))
                    resp, err := webHttpClient.Do(req)
                    if err != nil {
                        log.Printf("Submit rebal error 3: %v
", err)
                        w.WriteHeader(500)
                        return
                    }
                    defer resp.Body.Close()

                    body, _ := ioutil.ReadAll(resp.Body)
                    w.WriteHeader(200)
                    w.Write(body)
dt2002
dt2002 与其让人们继续猜测,不如提供一个最小的可重现的示例来证明问题,这将更有帮助。
大约一年之前 回复
dqsh30374
dqsh30374 我出于测试目的设置MaxIdleConnections=2(以尝试解决此问题)。我一次将webHttpClient声明为单例,并多次使用。因此,我不应该做任何事情来创建一个新的http.Transport-还有其他想法吗?
大约一年之前 回复
dousi2553
dousi2553 一个http.Transport拥有一个连接池。确保应用程序重复使用单个http.Tranport。MaxIdleConnections的值是什么?
大约一年之前 回复
doulan2827
doulan2827 还要确保您正在重用客户端。每个客户端(或更准确地说,http.Transport)都维护一个连接池,因此不要为每个请求都创建一个新客户端。
大约一年之前 回复
douhu2898
douhu2898 我添加了一个有关如何进行客户端操作的示例-您是否遇到任何问题?TIA!
大约一年之前 回复
dongmeng2509
dongmeng2509 正如@CeriseLimón所说,您应该关闭响应的正文而不是请求的正文。不需要在服务器处理程序中关闭请求的主体。
大约一年之前 回复
dsgdg54ef4365
dsgdg54ef4365 该应用程序是否关闭http响应正文?显示一些代码。
大约一年之前 回复
Csdn user default icon
上传中...
上传图片
插入图片
抄袭、复制答案,以达到刷声望分或其他目的的行为,在CSDN问答是严格禁止的,一经发现立刻封号。是时候展现真正的技术了!
立即提问
相关内容推荐