Uploaded image for project: 'Ratis'
  1. Ratis
  2. RATIS-1577

Install snapshot failure

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 2.3.0
    • 2.3.0
    • server
    • None

    Description

      Found this while testing Ozone HA with Ratis built from current master:

      scm2_1      | 2022-05-08 09:36:23,915 [grpc-default-executor-0] INFO impl.SnapshotInstallationHandler: dd69b21b-bf75-4092-bd68-19f526eb2628@group-9D638F7E0D95: receive installSnapshot: 517d4b01-482b-4891-b0de-80441d4eb637->dd69b21b-bf75-4092-bd68-19f526eb2628#0-t2,notify:(t:1, i:0)
      scm2_1      | 2022-05-08 09:36:23,923 [grpc-default-executor-0] INFO server.RaftServer$Division: dd69b21b-bf75-4092-bd68-19f526eb2628@group-9D638F7E0D95: change Leader from null to 517d4b01-482b-4891-b0de-80441d4eb637 at term 2 for installSnapshot, leader elected after 2809ms
      scm2_1      | 2022-05-08 09:36:23,927 [grpc-default-executor-0] INFO impl.SnapshotInstallationHandler: dd69b21b-bf75-4092-bd68-19f526eb2628@group-9D638F7E0D95: Received notification to install snapshot at index 0
      scm2_1      | 2022-05-08 09:36:23,958 [grpc-default-executor-0] INFO impl.SnapshotInstallationHandler: dd69b21b-bf75-4092-bd68-19f526eb2628@group-9D638F7E0D95: notifyInstallSnapshot: nextIndex is 0 but the leader's first available index is 0.
      scm2_1      | 2022-05-08 09:36:23,960 [grpc-default-executor-0] INFO ha.SCMStateMachine: Received install snapshot notification from SCM leader: scm1:9894 with term index: (t:1, i:0)
      scm2_1      | 2022-05-08 09:36:23,961 [pool-16-thread-1] INFO ha.SCMHAManagerImpl: Downloading checkpoint from leader SCM scm1 and reloading state from the checkpoint.
      scm2_1      | 2022-05-08 09:36:23,965 [grpc-default-executor-0] ERROR impl.SnapshotInstallationHandler: dd69b21b-bf75-4092-bd68-19f526eb2628@group-9D638F7E0D95: installSnapshot failed
      scm2_1      | java.lang.IllegalStateException: inProgressInstallSnapshotRequest: 0 is not eligible, firstAvailableLogIndex: 0
      scm2_1      | 	at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:60)
      scm2_1      | 	at org.apache.ratis.server.impl.SnapshotInstallationHandler.notifyStateMachineToInstallSnapshot(SnapshotInstallationHandler.java:287)
      scm2_1      | 	at org.apache.ratis.server.impl.SnapshotInstallationHandler.installSnapshotImpl(SnapshotInstallationHandler.java:115)
      scm2_1      | 	at org.apache.ratis.server.impl.SnapshotInstallationHandler.installSnapshot(SnapshotInstallationHandler.java:84)
      scm2_1      | 	at org.apache.ratis.server.impl.RaftServerImpl.installSnapshot(RaftServerImpl.java:1427)
      

      The log above is from docker-compose cluster, but it is easily reproducible using MiniCluster-based integration test, too:

      mvn clean test -DfailIfNoTests=false -Dtest=TestStorageContainerManagerHA
      

      Attachments

        Issue Links

          Activity

            People

              adoroszlai Attila Doroszlai
              adoroszlai Attila Doroszlai
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2.5h
                  2.5h