Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Motivation
It is possible that some events were propagated to ms.logicalTopology, but restart happened when we were updating topologyAugmentationMap and other states in DistributionZoneManager#createMetastorageTopologyListener. That means that augmentation that must be added to zone.topologyAugmentationMap wasn't added and we need to recover this information, or nodesAttributes wasn't propogated to MS.
Definition of done
On a node restart, all states, that were going to be updated during watch event in DistributionZoneManager#createMetastorageTopologyListener must be recovered
Implementation notes
(outdated, see UPD)
For every zone, compare MS.local.logicalTopology.revision with max(maxScUpFromMap, maxScDownFromMap). If logicalTopology.revision is greater than max(maxScUpFromMap, maxScDownFromMap), that means that some topology changes haven't been propagated to topologyAugmentationMap before restart and appropriate timers haven't been scheduled. To fill the gap in topologyAugmentationMap, compare MS.local.logicalTopology with lastSeenLogicalTopology and enhance topologyAugmentationMap with the nodes that did not have time to be propagated to topologyAugmentationMap before restart. lastSeenTopology is calculated in the following way: we read MS.local.dataNodes, also we take max(scaleUpTriggerKey, scaleDownTriggerKey) and retrieve all additions and removals of nodes from the topologyAugmentationMap using max(scaleUpTriggerKey, scaleDownTriggerKey) as the left bound. After that apply these changes to the map with nodes counters from MS.local.dataNodes and take nodes only with the positive counters. This is the lastSeenTopology. Comparing it with MS.local.logicalTopology will tell us which nodes were not added or removed and weren't propagated to topologyAugmentationMap before restart. We take these differences and add them to the topologyAugmentationMap. As a revision (key for topologyAugmentationMap) take MS.local.logicalTopology.revision. It is safe to take this revision, because if some node was added to the ms.topology after immediate data nodes recalculation, this added node must restore this immediate data nodes' recalculation intent.
UPD: Implementation notes are outdated, we've implemented a bit different approach: now we save the last handled topology to MS, and on restart we restore global states according to states from local metastorage and check if the current ms.logicalTopology differs from the one that was handled in DistributionZoneManager#createMetastorageTopologyListener (we check revision of this events), then we just repeat the logic from DistributionZoneManager#createMetastorageTopologyListener with the new logical topology from the ms.logicalTopology.
Attachments
Attachments
Issue Links
- blocks
-
IGNITE-19491 Proper utilisation of a distribution zone manager's ZoneState#topologyAugmentationMap.
- Open
- links to