2019-08-16 10:24
浏览 210

nginx 502错误始终存在,没有应用程序错误

I ran into a strange problem suddenly a few days back.

My application has been running for quite sometime with no issues. Suddenly I started seeing a good number of 502 errors consistently. We operate at about 50000 requests per minute with an average server response time of 12 ms.

There are no application panics (errors) and nginx config also allows upto 10000 worker connections.

Other configuration..

sendfile        on;
keepalive_timeout  600;
server_tokens off;
client_max_body_size 20m;

Can anyone help me with the direction I should be looking at solving the problem? I get one of these below errors and mostly the first one (sendfile()) below.

2019/08/16 15:01:42 [error] 30#0: *60729 sendfile() failed (32: Broken pipe) while sending request to upstream, client: <IP>, server: <hostname>, request: "POST <endpoint> HTTP/1.1", upstream: "<endpoint>", host: "<hostname>"

2019/08/16 15:01:45 [error] 30#0: *60821 readv() failed (104: Connection reset by peer) while reading upstream, client: <IP>, server: <hostname>, request: "POST <endpoint> HTTP/1.1", upstream: "<endpoint>", host: "<hostname>"

2019/08/16 14:55:27 [error] 20#0: *42152 upstream timed out (110: Connection timed out) while reading response header from upstream, client: <IP>, server: <hostname>, request: "POST <endpoint> HTTP/1.1", upstream: "<endpoint>", host: "<hostname>"

We are on golang and gin framework, if it helps in any debugging.

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 邀请回答

1条回答 默认 最新

  • doufu8127
    doufu8127 2019-08-27 10:43

    Looking at the request_length for each request, we figured out that the requests that are being returned 502 are the ones which have POST data size over 200 KB or so. So, we used the config


    and set it's value to 1 MB, which otherwise by default is size of upto 2 pages (16 KB on 64-bit machines). If the POST data is more than 16 KB, it stores the data in a temp file on disk which causes additional I/O latency. Because of this, the sendfile() failed errors are reduced immediately to zero. For every request I logged


    and hence it was easy to find all 502s from access logs and their corresponding size

    There are still very few readv() errors though.

    点赞 评论