I am having issues with NGINX. I have 3 systems (EC2) that are load balanced (ELB) running NGINX & php-fpm. On ALL 3 systems NGINX will mysteriously crash. Below I have included a portion of the LOG in DEBUG mode.
Before I past the log, here is my theory about what may be happening and maybe someone can confirm or provide some better insight that I can dive deeper into. From what I have found online; a major reason for NGINX crashing is a known issue with NFS shares. This occurs when NGINX makes a call for a file and the NFS engine is busy 'Blocking'. I can confirm that I have about 6 NFS shares connected to these systems. There is one per site that is setup. These shares only contain pertinent directories that had to be shared files, such as uploaded images and avatars.
I have read online that an optimization to use was to set NGINX to use 'epoll'. While I do not directly state it in the settings, You can see the use of it in the log. Should I still add the setting to the CONF file? Are the NFS shares the source of my problem?
Thanks, & any help is greatly appreciated.
Here is an NGINX -V:
nginx version: nginx/1.4.7
built by gcc 4.8.2 20131212 (Red Hat 4.8.2-7) (GCC)
TLS SNI support enabled
configure arguments:
--prefix=/usr/share/nginx --sbin-path=/usr/sbin/nginx
--conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log
--http-log-path=/var/log/nginx/access.log
--http-client-body-temp-path=/var/lib/nginx/tmp/client_body
--http-proxy-temp-path=/var/lib/nginx/tmp/proxy
--http-fastcgi-temp-path=/var/lib/nginx/tmp/fastcgi
--http-uwsgi-temp-path=/var/lib/nginx/tmp/uwsgi
--http-scgi-temp-path=/var/lib/nginx/tmp/scgi
--pid-path=/var/run/nginx.pid --lock-path=/var/lock/subsys/nginx
--user=nginx --group=nginx --with-file-aio --with-ipv6
--with-http_ssl_module --with-http_spdy_module
--with-http_realip_module --with-http_addition_module
--with-http_xslt_module --with-http_image_filter_module
--with-http_geoip_module --with-http_sub_module --with-http_dav_module
--with-http_flv_module --with-http_mp4_module --with-http_gunzip_module
--with-http_gzip_static_module --with-http_random_index_module
--with-http_secure_link_module --with-http_degradation_module
--with-http_stub_status_module --with-http_perl_module --with-mail
--with-mail_ssl_module --with-pcre --with-google_perftools_module
--with-debug
--with-cc-opt='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic'
--with-ld-opt=' -Wl,-E'
Here is the DEBUG information:
2014/09/17 08:37:46 [debug] 2034#0: worker cycle
2014/09/17 08:37:46 [debug] 2034#0: epoll timer: 11605
2014/09/17 08:37:46 [debug] 2034#0: epoll: fd:69 ev:0005 d:0000000001632281
2014/09/17 08:37:46 [debug] 2034#0: timer delta: 0
2014/09/17 08:37:46 [debug] 2034#0: posted events 0000000001661460
2014/09/17 08:37:46 [debug] 2034#0: posted event 0000000001661460
2014/09/17 08:37:46 [debug] 2034#0: posted event 0000000000000000
2014/09/17 08:37:46 [debug] 2034#0: worker cycle
2014/09/17 08:37:46 [debug] 2034#0: epoll timer: 11605
2014/09/17 08:37:50 [debug] 2034#0: epoll: fd:51 ev:0005 d:0000000001631500
2014/09/17 08:37:50 [debug] 2034#0: *1 event timer del: 51: 1410964731764
2014/09/17 08:37:50 [debug] 2034#0: *1 http process request line
2014/09/17 08:37:50 [debug] 2034#0: *1 http request line: "GET /data/avatars/m/0/163.jpg HTTP/1.1"
2014/09/17 08:37:50 [debug] 2034#0: *1 http uri: "/data/avatars/m/0/163.jpg"
2014/09/17 08:37:50 [debug] 2034#0: *1 http args: ""
2014/09/17 08:37:50 [debug] 2034#0: *1 http exten: "jpg"
2014/09/17 08:37:50 [debug] 2034#0: *1 posix_memalign: 00000000014FA890:4096 @16
2014/09/17 08:37:50 [debug] 2034#0: *1 http process request header line
2014/09/17 08:37:50 [debug] 2034#0: timer delta: 4205
2014/09/17 08:37:50 [debug] 2034#0: posted events 0000000001661460
2014/09/17 08:37:50 [debug] 2034#0: posted event 0000000001661460
2014/09/17 08:37:50 [debug] 2034#0: posted event 0000000000000000
2014/09/17 08:37:50 [debug] 2034#0: worker cycle
2014/09/17 08:37:50 [debug] 2034#0: epoll timer: 7400
2014/09/17 09:03:15 [debug] 2144#0: bind() 0.0.0.0:80 #46
2014/09/17 09:03:15 [debug] 2144#0: counter: 00007F697920F080, 1
2014/09/17 09:03:15 [debug] 2144#0: posix_memalign: 00000000011CF400:16384 @16
2014/09/17 09:03:18 [debug] 2153#0: bind() 0.0.0.0:80 #46
2014/09/17 09:03:18 [emerg] 2153#0: bind() to 0.0.0.0:80 failed (98: Address already in use)
2014/09/17 09:03:18 [notice] 2153#0: try again to bind() after 500ms
2014/09/17 09:03:18 [debug] 2153#0: bind() 0.0.0.0:80 #46
2014/09/17 09:03:18 [emerg] 2153#0: bind() to 0.0.0.0:80 failed (98: Address already in use)
2014/09/17 09:03:18 [notice] 2153#0: try again to bind() after 500ms
2014/09/17 09:03:18 [debug] 2153#0: bind() 0.0.0.0:80 #46
2014/09/17 09:03:18 [emerg] 2153#0: bind() to 0.0.0.0:80 failed (98: Address already in use)
2014/09/17 09:03:18 [notice] 2153#0: try again to bind() after 500ms
2014/09/17 09:03:18 [debug] 2153#0: bind() 0.0.0.0:80 #46
2014/09/17 09:03:18 [emerg] 2153#0: bind() to 0.0.0.0:80 failed (98: Address already in use)
2014/09/17 09:03:18 [notice] 2153#0: try again to bind() after 500ms
2014/09/17 09:03:18 [debug] 2153#0: bind() 0.0.0.0:80 #46
2014/09/17 09:03:18 [emerg] 2153#0: bind() to 0.0.0.0:80 failed (98: Address already in use)
2014/09/17 09:03:18 [notice] 2153#0: try again to bind() after 500ms
2014/09/17 09:03:18 [emerg] 2153#0: still could not bind()