Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
When discovery is used in a stack based on jackrabbit oak as the repository, the current way of discoving instances somewhat sounds like duplicating work: oak, or more precisely documentnodestore, itself has a low-level lease mechanism where it stores information about the cluster nodes including a leaseEnd indicating at what time others can consider a particular node as dead/crashed. This corresponds pretty much to the discovery.impl heartbeat mechanism. And in a stack which is built ontop of oak-documentMk, we could be making use of this fact and delegate the decision about whether a node in a cluster is alive or not to the oak layer. Also, with OAK-2597 the relevant information: ActiveClusterNodes is nicely exposed via JMX - so that can become the new source of truth defining the cluster view.
When replacing discovery-owned heartbeats with oak-owned ones, there is one important detail to be watched out for: it can no longer easily be determined from another instance in the cluster, whether it has this new discovery bundle activated or not. Hence it is not given that when a voting happens, that all active nodes (as reported by oak-documentMk) are actually going to respond. So the 'silent instance due to deactivated discovery bundle' case needs special attention/handling.
Other than that, given the normal case of all active nodes having the bundle activated, the voting mechanism can stay the same as in discovery.impl. The topology connectors can be treated the same too (by storing announcements to their respective /var/discovery/clusterInstances/<slingId>/announcements/<announcerSlingId> node. The properties can be handled the same too (by storing to /properties node. Only thing that gets replaced is the heartbeats.
Note that in order for such an oak-based discovery.impl this oak-lease mechanism must be very robust (it should be so by its own interest already). However, there are currently a few issues that should probably first be resolved until discovery can be based on this: OAK-2739, OAK-2682 and OAK-2681 are currently known in this area.
Attachments
Issue Links
- is blocked by
-
OAK-2681 Update lease without holding lock
- Closed
-
OAK-2682 Introduce time difference detection for DocumentNodeStore
- Closed
-
OAK-2739 take appropriate action when lease cannot be renewed (in time)
- Closed
-
SLING-5173 factor away reusable base (incl connectors) of discovery.impl into discovery.base
- Closed
- relates to
-
SLING-2939 3rd-party based implementation of discovery.api
- Resolved
- requires
-
OAK-2844 Introducing a simple document-based discovery-light service (to circumvent documentMk's eventual consistency delays)
- Closed
- supercedes
-
SLING-4640 Possibility of duplicate leaders w/discovery.impl on eventually consistent repo
- Closed