I tried to create about 5 thousand concurrent clients (non stop sending of data from server to client) connected to my glassfish websocket server. (CPU: Double-Core, 8 GB RAM)
After about 2500 clients were connected the connection time was about 67(!) seconds and I was not able to connect more clients cause of TimeOutException
.
Some facts:
- Thread Pool max size was set to 12.000.
- At the time of first TimeoutException I had 2500 clients and about 2450 Threads. So we speak here about one thread per connection.
- It is not a memory issue!
Then a wrote two simple Websocket Proxy Server in Node.js an golang to handle websocket connections. The data exchange between Proxy Server and glassfish server happen over simple Websocket connection.
Now I was able to create over 5 thousand concurrent clients without problems. The same issue I had with Wildfly 8 and Tomcat 8. Here is the evaluation:
Now my question.
Tyrus is the implementation of Websocket protocol in glassfish and uses java.nio
library under the hood for non blocking I/O.
So why it has anyway so poor scalability? Or why is the scalability so different. I mean I see no advantages of java.nio
P.S. just scheduling overhead?
EDIT: How to reproduce the issue
Client software for creating and connecting Clients and the Websocket Server are on different PCs.
- Start a glassfish simple echo websocket server. There are about 76 active glassfish threads.
- now connect 2000 clients without initiating any communication, just connect to the server - Idle mode. In my case it there was no problem to do it. Since all clients were connected there were about 80 active threads. Nearly unchanged.
- Using a
Timer
let all your 2000 clients start communication with the websocket server simultaneously. You will notice the amount of threads is now about 2019. The response time is about 6.5 seconds. - Finally try to connect the next 500 clients. The first TimeoutException will occur. Only few clients will be connected.
- Stop simultaneously communication.