Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-20310

Meta storage invokes are not completed when DZM start is completed

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0-beta2
    • None

    Description

      Motivation

      There are meta storage invokes in DistributionZoneManager start. Currently it does the meta storage invokes in DistributionZoneManager#createOrRestoreZoneState:

      1. DistributionZoneManager#initDataNodesAndTriggerKeysInMetaStorage to init the default zone.
      2. DistributionZoneManager#restoreTimers in case when a filter update was handled before DZM stop, but it didn't update data nodes.

      Futures of these invokes are ignored. So after the start method is completed actually not all start actions are completed. It can lead to the following situation:

      • Initialisation of the default zone is hanged for some reason even after full restart of the cluster.
      • That means that all data nodes related keys in metastorage haven't been initialised.
      • For example, if user add some new node, and scale up timer is immediate, which leads to immediate data nodes recalculation, this recalculation won't happen, because data nodes key have not been initialised.

      Possible solutions

      Easier

      We just need to wait for all async logic to be completed within the DistributionZoneManager#start with ms.invoke().get()

      Harder

      We can enhance IgniteComponent#start, so it could return Completable future, and after that we need to change the flow of starting components, so node is not ready to work until all IgniteComponent#start futures are completed. For example, we can chain our futures on IgniteImpl#recoverComponentsStateOnStart, so components' futures are completed before metaStorageMgr.deployWatches().
      In DistributionZoneManager#start we can return CompletableFuture.allOf features, that are needed to be completed in the DistributionZoneManager#start

      Definition of done

      All asynchronous logic in the DistributionZoneManager#start is done before a node is ready to work, in particular, ready to interact with zones.

      UPD:
      We decided to implement the easier way, the harder will be implemented in the separate ticket https://issues.apache.org/jira/browse/IGNITE-20477

      Attachments

        Issue Links

          Activity

            People

              maliev Mirza Aliev
              Sergey Uttsel Sergey Uttsel
              Vladislav Pyatkov Vladislav Pyatkov
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h