Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-6900

SCM node went down with UndeclaredThrowableException while running container balancer

    XMLWordPrintableJSON

Details

    Description

      SCM nodeĀ  went down with UndeclaredThrowableException when container balancer is running and 2 other SCM nodes were shutdown.

      2022-06-15 20:00:15,634 WARN org.apache.ratis.grpc.server.GrpcLogAppender: 99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF->cbdea5d3-682d-43e6-a17a-bdce757b7764-GrpcLogAppender: Leader has not got in touch with Follower 99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF->cbdea5d3-682d-43e6-a17a-bdce757b7764(c-1,m0,n310, attendVote=true, lastRpcSendTime=1, lastRpcResponseTime=32843) yet, just keep nextIndex unchanged and retry. 2022-06-15 20:00:16,887 WARN org.apache.ratis.grpc.server.GrpcLogAppender: 99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF->cbdea5d3-682d-43e6-a17a-bdce757b7764-AppendLogResponseHandler: Failed appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception 2022-06-15 20:00:16,888 WARN org.apache.ratis.grpc.server.GrpcLogAppender: 99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF->cbdea5d3-682d-43e6-a17a-bdce757b7764-GrpcLogAppender: Leader has not got in touch with Follower 99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF->cbdea5d3-682d-43e6-a17a-bdce757b7764(c-1,m0,n310, attendVote=true, lastRpcSendTime=4, lastRpcResponseTime=34097) yet, just keep nextIndex unchanged and retry. 2022-06-15 20:00:18,121 ERROR org.apache.ratis.server.impl.StateMachineUpdater: 99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF-StateMachineUpdater caught a Throwable. java.lang.reflect.UndeclaredThrowableException at com.sun.proxy.$Proxy19.completeMove(Unknown Source) at org.apache.hadoop.hdds.scm.container.replication.LegacyReplicationManager.deleteSrcDnForMove(LegacyReplicationManager.java:1249) at org.apache.hadoop.hdds.scm.container.replication.LegacyReplicationManager.lambda$onLeaderReadyAndOutOfSafeMode$40(LegacyReplicationManager.java:1871) at java.base/java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1603) at org.apache.hadoop.hdds.scm.container.replication.LegacyReplicationManager.onLeaderReadyAndOutOfSafeMode(LegacyReplicationManager.java:1850) at org.apache.hadoop.hdds.scm.container.replication.LegacyReplicationManager.notifyStatusChanged(LegacyReplicationManager.java:1649) at org.apache.hadoop.hdds.scm.container.replication.ReplicationManager.notifyStatusChanged(ReplicationManager.java:375) at org.apache.hadoop.hdds.scm.ha.SCMServiceManager.notifyStatusChanged(SCMServiceManager.java:52) at org.apache.hadoop.hdds.scm.ha.SCMStateMachine.notifyTermIndexUpdated(SCMStateMachine.java:330) at org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1566) at org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:239) at org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:182) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: java.util.concurrent.TimeoutException at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886) at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021) at org.apache.hadoop.hdds.scm.ha.SCMRatisServerImpl.submitRequest(SCMRatisServerImpl.java:225) at org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invokeRatis(SCMHAInvocationHandler.java:111) at org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invoke(SCMHAInvocationHandler.java:67) ... 13 more 2022-06-15 20:00:18,122 INFO org.apache.ratis.server.RaftServer$Division: 99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF: shutdown 2022-06-15 20:00:18,122 INFO org.apache.ratis.util.JmxRegister: Successfully un-registered JMX Bean with object name Ratis:service=RaftServer,group=group-0B75F4A309CF,id=99c85376-060f-4b3c-8973-a2d2b1dd23e6 2022-06-15 20:00:18,122 INFO org.apache.ratis.server.impl.RoleInfo: 99c85376-060f-4b3c-8973-a2d2b1dd23e6: shutdown 99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF-LeaderStateImpl 2022-06-15 20:00:18,124 INFO org.apache.ratis.server.impl.PendingRequests: 99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF-PendingRequests: sendNotLeaderResponses 2022-06-15 20:00:18,125 WARN org.apache.ratis.grpc.server.GrpcLogAppender: 99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF->b6382f07-de2e-4986-8275-9146e73360a6-GrpcLogAppender: Wait interrupted by java.lang.InterruptedException 2022-06-15 20:00:18,128 INFO org.apache.hadoop.hdds.scm.ha.SCMStateMachine: current leader SCM steps down.

      Attachments

        Issue Links

          Activity

            People

              siddhant Siddhant Sangwan
              nilotpalnandi Nilotpal Nandi
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: