is it possible that a request for a page, where the server or php might have an issue freezes and even disconnects other not related SSH services?
I am running a simple webpage (10 pictures and some text) on a dockerized environment with separate reverse proxy, a web server, a database (nginx, php-fpm and postgresql).
The whole system was up without a restart for a year or so, without problems. Now I have a newly occurring issue (about a month) with page/system freezes. When I visit my webpage it locks up from time to time (sometimes 1 instance is enough, other times, I need to open up to 20x) and needs about 30 seconds to start reacting again.The strange thing is that if I am connected in parallel with SSH to the server, it sometimes (not always) also disconnects my terminal. Which is why I believed it hast to do something with the system (but can't find anything there, so trying a different perspective here).
server (only remote access available): Debian GNU/Linux 9.4 (stretch) Kernel: 4.9.0-6-amd64 #1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02) x86_64 GNU/Linux 68GB Ram, 8 Core, 2x4 TB HDDs and 1TB SDD 1 GBit-Uplink
I have monitoring installed and there does not seem to be any high workload on the IOs, network, CPU, or other during the lock up (I am not monitoring php stats though). I also have the same setup running on a local test server (different hardware and Kernel 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1 (2018-05-07) x86_64 GNU/Linux) and that server has no freezing issues, so again an argument against the issue being with the dockerized environment or my page code.
I have done so far on the hardware side:
- 1.) SMART diagnostics - without any obvious issues (the "backup disk (not the one the servers are saved on)" has for some time: 191 G-Sense_Error_Rate 0x0032 001 001 000 , but the provider ran a separate test some time ago and said that the disk has no issue, and that the G-Sense_Error_Rate has little informational value anyhow)
- 2.) atop ( htop and iotop are live and SSH disconnects, thus I can't watch it as the problem occurs) over a 1s interval and 300 samples (thus 5mins), where i was able to produce multiple freezes, but there were no obvious load issues (granted this is the first time I am looking at those things! - but there was also no high level line coloring that atop does automatically)
- 3.) I have also a dockerized monitoring stack running (the freeze occurs with it running and with it being disabled, so it should not come from here either) where I can view the dockers separately and they also do not show anything alarming
- 4.) restarted the whole server - issue continues
- 5.) memtester-d 55 of 65 RAM without issues
- 6.) no problems in syslog
- 7.) ping the server, while producing the error and the ping is quick with 27ms, but when the server hangs, I lose 1 ping in about 10 (in those 30-40s, then ping is perfect again). But I cannot figure out, why that is
Where else could I look????
Any suggestions are highly appreciated! Thanks!