Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
Discovery Commons 1.0.24, Discovery Base 2.0.10, Discovery Oak 1.2.34
-
None
Description
Discovery.oak requires that both Oak and Sling are operating normally in order to declare victory and announce a new topology.
The startup phase is especially tricky in this regard, since there are multiple elements that need to get updated (some are in the Oak layer, some in Sling) :
- lease & clusterNodeId : this is maintained by Oak
- idMap : this is maintained by IdMapService (Sling)
- leaderElectionId : this is maintained by OakViewChecker (Sling)
- syncToken : this is maintained by SyncTokenService (Sling)
Situations have been seen where Oak starts up fine, but higher level (eg Sling) bundles were not activated within a reasonable amount of time. This lead to discovery staying in TOPOLOGY_CHANGING state for longer than expected.
There should be a mechanism that ignores (suppresses) newly joining instances if they start up only partially. However, after a certain timeout this mechanism should give up.