Symthom
When un-deploying and re-deploying an application the web and HTTP statistics would no longer update.
Finding
The reason for updates no longer reaching the statistics counters was that the connection done via class instrumentation was cut when un-deploying an app. This occurred because un-deploying an application does disable statistics providers specifically responsible for the application which wrongly triggered a removal of the class instrumentation for the probe provider class connecting the events to the statistics counters. So instead of just disconnecting the application specific statistics all statistics based on a particular class were disabled. Since the overall statistics for HTTP and web use the same class as the application specific ones this also disabled the overall statistics.
Solution
Because the trigger for changes of class instrumentation was a static call in the innermost part of the logic there was no clear way to only trigger updates to the class instrumentation after the logic is done updating the changed probes and invokers. Another mistake was that disabling a invoker caused disabling of the class even though there might be other invokers and other probes for the same provider class. This is why both transform and untransform became just update. If updating would require to update the instrumentation or remove it is now evaluated based on the probes and their active invokers. If there are active invokers and the probe is enabled the instrumentation is kept and updated. Methods with no invokers are no longer instrumented.
To avoid running the update for each trigger call, which is each method times the statistics provider instances, the trigger only flags for update and a asynchronous daemon thread is used to run the actual updates with a small delay. By granting a delay the logic causing multiple triggers is likely to have completed and flagged all needed updates before the thread actually performs them.
During shutdown the thread ends from interruption and does not perform (most of) the triggered transformation that a shutdown normally causes.
Note to the Reviewer
Main change is in ProbeProviderClassFileTransformer. Besides that I only added some missing generics and made some hash maps into concurrent ones while looking for the problem.
Note that synchronized was largely removed where present. Synchronisation is done by now using a concurrent map and atomics. To allow reclaim of key class key was changed to String which alled use of concurrent map. Other methods no longer need synchronisation because only the updater thread will call them.
This also fixes PAYARA-1285 which should be checked with the reproducer given in the ticket.
Testing
Easiest is to test this after #4274 is merged since it will be easy to verify in the monitoring console that un-deploying and re-deploying an application does no longer cause the request/sec to zero out.
With MC started enable web and HTTP monitoring until you see some request/sec in the Core view of MC. Redeploy MC and check that the requests/sec are still above 0, most likely 0.5 due to data polling every 2secs.
Testing should also check that enabling and disabling the web or HTTP monitoring has the expected effect. Again this is verified easiest after #4274 already being in master.
To check that instrumentation no longer is added or removed multiple times from provider classes I put a log output on INFO level in both cases of this if-statement https://github.com/payara/Payara/pull/4278/files#diff-16bec2e5206edbe4e95b050af4617992R193. Note that occasionally it is still possible that same class is handled twice due to unlucky async timing but not more often then that.
该提问来源于开源项目:payara/Payara