Uploaded image for project: 'Ratis'
  1. Ratis
  2. RATIS-1481

notifyStateMachineToInstallSnapshot stuck in IN_PROGRESS

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.3.0
    • server
    • None

    Description

      The phenomenon shows that in the ozone cluster, OM fails to install the snapshot. From the OM log, OM state machine has done its part(eg. download Checkpoint, install, load).

      First,

      stateMachine.followerEvent().notifyInstallSnapshotFromLeader(roleInfoProto, firstAvailableLogTermIndex).whenComplete(...) 

       it is an async action of CompletableFuture. Normally, the follower should be able to receive the future Installsnapshot request and tell back once it has already installed snapshot. But I found that the leader will not send Installsnapshot requests anymore.

       

      During whenComplete stage, these followings will be executed, which would update the snapshot index and commit index.

      stateMachine.pause();
      state.updateInstalledSnapshotIndex(reply);
      state.reloadStateMachine(reply.getIndex());
      installedSnapshotIndex.set(reply.getIndex()); 

       

       

      In the process of appendEntriesAsync, checkInconsistentAppendEntries will return inconsistency as the snapshot is still in progress. Once the actual upgrade of snapshot index and commit index takes place, the leader receives the inconsistency with the new index and then won't send installsnapshot requests anymore as the check of shouldNotifyToInstallSnapshot() will be null.

       

      Meanwhile, due to the async action of CompletableFuture, the follower raft server has not yet sent the SNAPSHOT_INSTALLED to leader according to the previous installsnapshot request and cannot receive future requests. This lead to an infinite loop of failed appendEntries and disappeared installshot progress

      Attachments

        Issue Links

          Activity

            People

              Nibiruxu Xu Shao Hong
              Nibiruxu Xu Shao Hong
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 6h 50m
                  6h 50m