Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.4, 1.5.14
-
None
-
None
-
None
Description
moved over to OAK-5528 due to internal Jira issues, please do not delete this ticket while the problem is being investigated
Fighting with cluster nodes losing their lease and shutting down oak-core in a cloud environment. For reasons unknown at this point in time, the whole process seems to skip about two minutes of real time.
This is a situation from which oak currently does not recover. Code analysis shows that ClusterNodeInfo is handed the LeaseCheckDocumentStoreWrapper instance to use as store. This is fatal since any action the renewLease() tries to do will first invoke the performLeaseCheck(). The lease check will, when the FailureMargin is reached, stall the renewLease() thread for 5 retry attempts and then declare the lease to be lost.
The ClusterNodeInfo should instead be using the "real" DocumentStore, not the wrapped one, IMO.