duanhuokuang5280
duanhuokuang5280
2015-11-08 17:03
采纳率: 100%
浏览 61
已采纳

Nginx php-fpm在高负载下写入连接堵塞

we have nginx/1.6.2 running with php5-fpm (5.6) on a debian 8 system.

In the past days we got higher load than usual due to more users hitting our servers. With most visitors coming in the evening hours between 6pm and midnight.

Since a couple of days, two different servers runnning the above setup showed very slow response rates for several hours. In Munin, we saw, that there were suddenly hundreds of nginx connections in "writing" state were there were previously only about 20 at a time.

We do not get any errors other than timed out connections on remote hosts when trying to access those servers. All logs I saw were just normal.

The problem can be fixed with a restart of php5-fpm.

My question now is: why do suddenly hundreds of processes claim they are writing? Is there some known issue or maybe config setting we missed which could cause this?

Here is the complete list of symptoms we see:

  • Instead of < 20 very fast active connections /s we see up to 100 to 900 connections in writing state (all nginx connections hit php5-fpm, static content is not served by these servers) Avg. script runtime for the php scripts is 80ms.
  • Problem occurs only if total amount of nginx requests /s goes above 300 /s, It then drops from ~350 to ~250 req/s but these 250 show up to 900 "writing" connections
  • Many of these connections eventually time out and give no correct result
  • There are no errors in our logs
  • The eth / database traffic as well as CPU load correspond to the lower level of 250req/s to which the total drops, so there is no "writing" happening afaik.

For the setup: as stated above. We use the build-in opcode cache of Zend, the APCu for some user variable cache, one of the servers runs a memcache instance (which works fine throughout the problem) and the other is running a Redis version, which also runs fine while the problem occurs.

Can anyone shed some light to what the problem might be?

Thanks!

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 邀请回答

2条回答 默认 最新

  • douhuan1497
    douhuan1497 2015-11-10 12:07
    已采纳

    We found the problem: APCu seems to be unstable with PHP 5.6.

    Details:

    • debian 8
    • nginx/1.6.2
    • PHP 5.6.14-0+deb8u1
    • APCu 4.0.7 (Revision: 328290, 126M shm_size)

    we used xhprof to profile requests when the server was slow (see question) and noticed, that APCu took > 100ms per read/write operation. Clearing the APCu variables did not help. All other parts of the code had normal speed.

    We completely disabled our use of APCu and the system has been stable since.

    So it seems, that this APCu version is unstable under load with PHP 5.6. At least for us.

    点赞 评论
  • douchui4459
    douchui4459 2018-09-17 12:32

    We had the same problem, and the reason for that was that the data in Redis was more than the "maxmemory" so redis was unable to write any more data. I could login with redis-cli but couldn't set a value, if you are having this issue, you could login to redis using redis-cli and try to set something, if the redis memory is full you'll get an error.

    点赞 评论

相关推荐