Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-20603

Restore logical topology change event on a node restart

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.0
    • None

    Description

      Motivation

      It is possible that some events were propagated to ms.logicalTopology, but restart happened when we were updating topologyAugmentationMap and other states in DistributionZoneManager#createMetastorageTopologyListener. That means that augmentation that must be added to zone.topologyAugmentationMap wasn't added and we need to recover this information, or nodesAttributes wasn't propogated to MS.

      Definition of done

      On a node restart, all states, that were going to be updated during watch event in DistributionZoneManager#createMetastorageTopologyListener must be recovered

      Implementation notes

      (outdated, see UPD)
      For every zone, compare MS.local.logicalTopology.revision with max(maxScUpFromMap, maxScDownFromMap). If logicalTopology.revision is greater than max(maxScUpFromMap, maxScDownFromMap), that means that some topology changes haven't been propagated to topologyAugmentationMap before restart and appropriate timers haven't been scheduled. To fill the gap in topologyAugmentationMap, compare MS.local.logicalTopology with lastSeenLogicalTopology and enhance topologyAugmentationMap with the nodes that did not have time to be propagated to topologyAugmentationMap before restart. lastSeenTopology is calculated in the following way: we read MS.local.dataNodes, also we take max(scaleUpTriggerKey, scaleDownTriggerKey) and retrieve all additions and removals of nodes from the topologyAugmentationMap using max(scaleUpTriggerKey, scaleDownTriggerKey) as the left bound. After that apply these changes to the map with nodes counters from MS.local.dataNodes and take nodes only with the positive counters. This is the lastSeenTopology. Comparing it with MS.local.logicalTopology will tell us which nodes were not added or removed and weren't propagated to topologyAugmentationMap before restart. We take these differences and add them to the topologyAugmentationMap. As a revision (key for topologyAugmentationMap) take MS.local.logicalTopology.revision. It is safe to take this revision, because if some node was added to the ms.topology after immediate data nodes recalculation, this added node must restore this immediate data nodes' recalculation intent.

      UPD: Implementation notes are outdated, we've implemented a bit different approach: now we save the last handled topology to MS, and on restart we restore global states according to states from local metastorage and check if the current ms.logicalTopology differs from the one that was handled in DistributionZoneManager#createMetastorageTopologyListener (we check revision of this events), then we just repeat the logic from DistributionZoneManager#createMetastorageTopologyListener with the new logical topology from the ms.logicalTopology.

      Attachments

        Issue Links

          Activity

            People

              maliev Mirza Aliev
              maliev Mirza Aliev
              Sergey Uttsel Sergey Uttsel
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 10m
                  1h 10m