[OAK-2683] the "hitting the observation queue limit" problem - ASF JIRA

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: core, mongomk
Labels:
- observation
- resilience

Epic Link:
Improve observation resilience

Description

There are several tickets in this area:

~~OAK-2587~~: threading with observation being too eagar causing observation queue to grow
~~OAK-2669~~: avoiding diffing from mongo by using persistent cache instead.
~~OAK-2349~~: which might be a duplicate or at least similar to 2669..
~~OAK-2562~~: diffcache is inefficient

Yet I think it makes sense to create this summarizing ticket, about describing again what happens when the observation queue hits the limit - and eventually about how this can be improved

Consider the following scenario (also compare with ~~OAK-2587~~ - but that one focused more on eagerness of threading):

rate of incoming commits is large and starts to generate many changes into the observation queues, hence those queue become somewhat filled/loaded
depending on the underlying nodestore used the calculation of diffs is more or less expensive - but at least for mongomk it is important that the diff can be served from the cache
- in case of mongomk it can happen that diffs are no longer found in the cache and thus require a round-trip to mongo - which is magnitudes slower than via cache of course. this would result in the queue to start increasing even faster as dequeuing becomes slower now.
- not sure about tarmk - I believe it should always be fast there
so based on the above, there can be a situation where the queue grows and hits the configured limit
if this limit is reached, the current mechanism is to collapse any subsequent change into one-big-marked-as-external-event change, lets call this a collapsed-change.
this collapsed-change now becomes part of the normal queue and eventually would 'walk down the queue' and be processed normally - hence opening a high chance that yet a new collapsed-change is created should the queue just hit the limit again. and this game can now be played for a while, resulting in the queue to contain many/mostly such collapse-changes.
there is now an additional assumption in that the diffing of such collapses is more expensive than normal diffing - plus it is almost guaranteed that the diff cannot for example be shared between observation listeners, since the exact 'collapse borders' depends on timing of each of the listeners' queues - ie the collapse diffs are unique thus not cachable..
so as a result: once you have those collapse-diffs you can almost not get rid of them - they are heavy to process - hence dequeuing is very slow
at the same time, there is always likely some commits happening in a typical system, eg with sling on top you have sling discovery which does heartbeats every now and then. So there's always new commits that add to the load.
this will hence create a situation where quite a small additional commit rate can keep all the queues filled - due to the fact that the queue is full with 'heavy collapse diffs' that have to be calculated for each and every listener (of which you could have eg 150-200) individually.

So again, possible solutions for this:

~~OAK-2669~~: tune diffing via persistent cache
~~OAK-2587~~: have more threads to remain longer 'in the cache zone'
tune your input speed explicitly to avoid filling the observation queues (this would be specific to your use-case of course, but can be seen as explicitly throttling on the input side)
increase the relevant caches to the max
but I think we will come up with yet a broader improvement of this observation queue limit problem by either
- doing flow control - eg via the commit rate limiter (also see ~~OAK-1659~~)
- moving out handling of observation changes to a messaging subsystem - be it to handle local events only (since handling external events makes the system problematic wrt scalability if not done right) - also see corresponding suggestion on dev list

Attachments

Issue Links

is blocked by

OAK-2829 Comparing node states for external changes is too slow

Closed

is related to

OAK-2562 DiffCache is inefficient

Closed

OAK-4568 JournalEntry.applyTo() creates complete change tree in memory

Closed

OAK-4572 Overflow to disk threshold too high

Closed

OAK-2349 DiffCache based on persistent cache

Resolved

OAK-2836 Create diff cache entry for merged persisted branch

Resolved

OAK-2587 observation processing too eager/unfair under load

Closed

OAK-1659 Improve CommitRateLimiter to delay commits

Closed

OAK-2669 Use Consolidated diff for local changes with persistent cache to avoid calculating diff again

Closed

OAK-2685 Track root state revision when reading the tree

Closed

OAK-4605 Separate persistent cache for diff and local_diff

Closed

OAK-4528 diff calculation in DocumentNodeStore should try to re-use journal info on diff cache miss

Closed

relates to

OAK-4581 Persistent local journal for more reliable event generation

Open

(7 is related to, 1 relates to)

Sub-Tasks

1.

Report maximum observation queue length in ObservationTest benchmark

Closed

Marcel Reutegger

the "hitting the observation queue limit" problem

Details

Description

Attachments

Issue Links

Sub-Tasks

Activity

People

Dates