Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-4224

OM failed to install snapshots after OM failover

    XMLWordPrintableJSON

Details

    Description

      OM failed to install snapshots after OM failover

      2020-09-09 22:07:13,746 [org.apache.ratis.server.impl.LogAppender$AppenderDaemon$$Lambda$380/117485186@47069ab2] INFO  server.GrpcLogAppender (GrpcLogAppender.java:installSnapshot(495)) - omNode-1@group-D62
      218D261DE->omNode-2-GrpcLogAppender: followerNextIndex = 65949 but logStartIndex = 68440, notify follower to install snapshot-(t:2, i:68440)
      2020-09-09 22:07:13,746 [grpc-default-executor-52] INFO  impl.RaftServerImpl (RaftServerImpl.java:notifyStateMachineToInstallSnapshot(1282)) - omNode-2@group-D62218D261DE: Snapshot Installation by StateMach
      ine is in progress.
      2020-09-09 22:07:13,752 [grpc-default-executor-52] INFO  impl.RaftServerImpl (RaftServerImpl.java:installSnapshot(1127)) - omNode-2@group-D62218D261DE: reply installSnapshot: omNode-1<-omNode-2#0:FAIL-t2,IN
      _PROGRESS
      2020-09-09 22:07:13,746 [grpc-default-executor-51] INFO  server.GrpcLogAppender (GrpcLogAppender.java:onNext(375)) - omNode-1@group-D62218D261DE->omNode-2-InstallSnapshotResponseHandler: received a reply om
      Node-1<-omNode-2#0:FAIL-t2,IN_PROGRESS
      2020-09-09 22:07:13,752 [grpc-default-executor-51] INFO  server.GrpcLogAppender (GrpcLogAppender.java:onNext(392)) - omNode-1@group-D62218D261DE->omNode-2-InstallSnapshotResponseHandler: InstallSnapshot in
      progress.
      2020-09-09 22:07:13,746 [grpc-default-executor-22] INFO  server.GrpcServerProtocolService (GrpcServerProtocolService.java:onCompleted(138)) - omNode-2: Completed INSTALL_SNAPSHOT, lastRequest: omNode-1->omN
      ode-2#0-t2,notify:(t:2, i:68440)
      2020-09-09 22:07:13,753 [grpc-default-executor-51] INFO  server.GrpcLogAppender (GrpcLogAppender.java:onNext(375)) - omNode-1@group-D62218D261DE->omNode-2-InstallSnapshotResponseHandler: received a reply om
      Node-1<-omNode-2#0:FAIL-t2,IN_PROGRESS
      2020-09-09 22:07:13,753 [grpc-default-executor-51] INFO  server.GrpcLogAppender (GrpcLogAppender.java:onNext(392)) - omNode-1@group-D62218D261DE->omNode-2-InstallSnapshotResponseHandler: InstallSnapshot in
      progress.
      2020-09-09 22:07:13,752 [grpc-default-executor-52] INFO  server.GrpcServerProtocolService (GrpcServerProtocolService.java:onCompleted(138)) - omNode-2: Completed INSTALL_SNAPSHOT, lastRequest: omNode-1->omN
      ode-2#0-t2,notify:(t:2, i:68440)
      2020-09-09 22:07:13,747 [org.apache.ratis.server.impl.LogAppender$AppenderDaemon$$Lambda$380/117485186@47069ab2] INFO  server.GrpcLogAppender (GrpcLogAppender.java:installSnapshot(503)) - omNode-1@group-D62
      218D261DE->omNode-2-GrpcLogAppender: send omNode-1->omNode-2#0-t2,notify:(t:2, i:68440)
      2020-09-09 22:07:13,756 [pool-144-thread-1] ERROR om.OzoneManager (OzoneManager.java:installCheckpoint(3178)) - Failed to stop/ pause the services. Cannot proceed with installing the new checkpoint.
      2020-09-09 22:07:13,759 [pool-144-thread-1] ERROR om.OzoneManager (OzoneManager.java:installSnapshotFromLeader(3141)) - Failed to install snapshot from Leader OM: {}
      java.lang.IllegalStateException: ILLEGAL TRANSITION: In OzoneManagerStateMachine:omNode-2:group-D62218D261DE, PAUSED -> PAUSING
              at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:63)
              at org.apache.ratis.util.LifeCycle$State.validate(LifeCycle.java:115)
              at org.apache.ratis.util.LifeCycle.transition(LifeCycle.java:155)
              at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.pause(OzoneManagerStateMachine.java:305)
              at org.apache.hadoop.ozone.om.OzoneManager.installCheckpoint(OzoneManager.java:3176)
              at org.apache.hadoop.ozone.om.OzoneManager.installCheckpoint(OzoneManager.java:3162)
              at org.apache.hadoop.ozone.om.OzoneManager.installSnapshotFromLeader(OzoneManager.java:3139)
              at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$notifyInstallSnapshotFromLeader$4(OzoneManagerStateMachine.java:372)
              at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      2020-09-09 22:07:13,760 [org.apache.ratis.server.impl.LogAppender$AppenderDaemon$$Lambda$380/117485186@47069ab2] INFO  server.GrpcLogAppender (GrpcLogAppender.java:installSnapshot(495)) - omNode-1@group-D62218D261DE->omNode-2-GrpcLogAppender: followerNextIndex = 65949 but logStartIndex = 68440, notify follower to install snapshot-(t:2, i:68440)
      2020-09-09 22:07:13,759 [grpc-default-executor-52] INFO  impl.RaftServerImpl (RaftServerImpl.java:installSnapshot(1117)) - omNode-2@group-D62218D261DE: receive installSnapshot: omNode-1->omNode-2#0-t2,notify:(t:2, i:68440)
      2020-09-09 22:07:13,765 [org.apache.ratis.server.impl.LogAppender$AppenderDaemon$$Lambda$380/117485186@47069ab2] INFO  server.GrpcLogAppender (GrpcLogAppender.java:installSnapshot(503)) - omNode-1@group-D62218D261DE->omNode-2-GrpcLogAppender: send omNode-1->omNode-2#0-t2,notify:(t:2, i:68440)
      2020-09-09 22:07:13,765 [grpc-default-executor-52] INFO  impl.RaftServerImpl (RaftServerImpl.java:notifyStateMachineToInstallSnapshot(1251)) - omNode-2@group-D62218D261DE: notifyInstallSnapshot: nextIndex is 67621 but the leader's first available index is 68440.
      2020-09-09 22:07:13,766 [grpc-default-executor-52] INFO  ratis.OzoneManagerStateMachine (OzoneManagerStateMachine.java:notifyInstallSnapshotFromLeader(368)) - Received install snapshot notification from OM leader: omNode-1 with term index: (t:2, i:68440)
      2020-09-09 22:07:13,766 [grpc-default-executor-52] INFO  impl.RaftServerImpl (RaftServerImpl.java:installSnapshot(1127)) - omNode-2@group-D62218D261DE: reply installSnapshot: omNode-1<-omNode-2#0:FAIL-t2,IN_PROGRESS
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              msingh Mukul Kumar Singh
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: