The current implementation of oak's observation event processing is too eager and thus unfair under load scenarios.
Consider having many (eg 200) Eventlisteners but only a relatively small threadpool (eg 5 as is the default in sling) backing them. When processing changes for a particular BackgroundObserver, that one (in BackgroundObserver.completionHandler.call) currently processes all changes irrespective of how many there are - ie it is eager. Only once that BackgroundObserver processed all changes will it let go and 'pass the thread' to the next BackgroundObserver. Now if for some reason changes (ie commits) are coming in while a BackgroundObserver is busy processing an earlier change, this will lengthen that while loop. As a result the remaining (eg 195) EventListeners will have to wait for a potentially long time until it's their turn - thus unfair.
Now combine the above pattern with a scenario where mongo is used as the underlying store. In that case in order to remain highly performant it is important that the diffs (for compareAgainstBaseState) are served from the MongoDiffCache for as many cases as possible to avoid doing a round-trip to mongoD. The unfairness in the BackgroundObservers can now result in a large delay between the 'first' observers getting the event and the 'last' one (of those 200). When this delay increases due to a burst in the load, there is a risk of the diffs to no longer be in the cache - those last observers are basically kicked out of the (diff) cache. Once this happens, the situation gets even worse, since now you have yet new commits coming in and old changes still having to be processed - all of which are being processed through in 'stripes of 5 listeners' before the next one gets a chance. This at some point results in a totally inefficient cache behavior, or in other words, at some point all diffs have to be read from mongoD.
To avoid this there are probably a number of options - a few one that come to mind:
- increase thread-pool to match or be closer to the number of listeners (but this has other disadvantages, eg cost of thread-switching)
- make BackgroundObservers fairer by limiting the number of changes they process before they give others a chance to be served by the pool.