Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-6241

None leader SCM node repeatedly sending requests to Ratis server

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • SCM HA
    • None

    Description

      A none SCM-HA leader node repeatedly sending requests to Ratis server. The SCM node has been in this state for many days or even weeks. A SCM log could look like this:

      :

      2022-02-01 11:54:35,413 INFO org.apache.hadoop.hdds.scm.container.IncrementalContainerReportHandler: Moving container #290631 to CLOSED state, datanode xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx{ip: xx.xx.xxx.xx, host: xxxxxx.xxxxx.xxxxxxxx.com, ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, STANDALONE=9859], networkLocation: /default, certSerialId: null, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0} reported CLOSED replica.

      2022-02-01 11:54:35,414 INFO org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler: Invoking method public abstract void org.apache.hadoop.hdds.scm.container.ContainerStateManager.updateContainerState(org.apache.hadoop.hdds.protocol.proto.HddsProtos$ContainerID,org.apache.hadoop.hdds.protocol.proto.HddsProtos$LifeCycleEvent) throws java.io.IOException,org.apache.hadoop.ozone.common.statemachine.InvalidStateTransitionException on target org.apache.hadoop.hdds.scm.ha.SCMRatisServerImpl@23e9a90a, cost 124.728us

      2022-02-01 11:54:35,414 ERROR org.apache.hadoop.hdds.scm.container.IncrementalContainerReportHandler: Exception while processing ICR for container 290631

      org.apache.ratis.protocol.exceptions.NotLeaderException: Server xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx@group-XXXXXXXXXXXX is not the leader xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx|rpc:xxxxxx.xxxxx.xxxxxxxx.com:9894|admin:|client:|dataStream:|priority:0

              at org.apache.ratis.server.impl.RaftServerImpl.generateNotLeaderException(RaftServerImpl.java:667)

              at org.apache.ratis.server.impl.RaftServerImpl.checkLeaderState(RaftServerImpl.java:632)

              at org.apache.ratis.server.impl.RaftServerImpl.submitClientRequestAsync(RaftServerImpl.java:758)

              at org.apache.ratis.server.impl.RaftServerProxy.lambda$submitClientRequestAsync$9(RaftServerProxy.java:437)

              at org.apache.ratis.server.impl.RaftServerProxy.lambda$null$7(RaftServerProxy.java:432)

              at org.apache.ratis.util.JavaUtils.callAsUnchecked(JavaUtils.java:115)

              at org.apache.ratis.server.impl.RaftServerProxy.lambda$submitRequest$8(RaftServerProxy.java:432)

              at java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:995)

              at java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2137)

              at org.apache.ratis.server.impl.RaftServerProxy.submitRequest(RaftServerProxy.java:431)

              at org.apache.ratis.server.impl.RaftServerProxy.submitClientRequestAsync(RaftServerProxy.java:437)

              at org.apache.hadoop.hdds.scm.ha.SCMRatisServerImpl.submitRequest(SCMRatisServerImpl.java:222)

              at org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invokeRatis(SCMHAInvocationHandler.java:110)

              at org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invoke(SCMHAInvocationHandler.java:67)

              at com.sun.proxy.$Proxy15.updateContainerState(Unknown Source)

              at org.apache.hadoop.hdds.scm.container.ContainerManagerImpl.updateContainerState(ContainerManagerImpl.java:273)

              at org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.updateContainerState(AbstractContainerReportHandler.java:227)

              at org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.processContainerReplica(AbstractContainerReportHandler.java:96)

              at org.apache.hadoop.hdds.scm.container.IncrementalContainerReportHandler.onMessage(IncrementalContainerReportHandler.java:88)

              at org.apache.hadoop.hdds.scm.container.IncrementalContainerReportHandler.onMessage(IncrementalContainerReportHandler.java:40)

              at org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85)

              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

              at java.lang.Thread.run(Thread.java:748)

      2022-02-01 11:54:35,423 INFO org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler: Close container Event triggered for container : #290631

      2022-02-01 11:54:35,424 WARN org.apache.hadoop.hdds.scm.ha.SCMContext: getTerm is invoked when not leader.

      2022-02-01 11:54:35,424 WARN org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler: Skip sending close container command, since current SCM is not leader.

      org.apache.ratis.protocol.exceptions.NotLeaderException: Server xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx@group-XXXXXXXXXXXX is not the leader xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx|rpc:xxxxxx.xxxxx.xxxxxxxx.com:9894|admin:|client:|dataStream:|priority:0

              at org.apache.hadoop.hdds.scm.ha.SCMRatisServerImpl.triggerNotLeaderException(SCMRatisServerImpl.java:278)

              at org.apache.hadoop.hdds.scm.ha.SCMContext.getTermOfLeader(SCMContext.java:191)

              at org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler.onMessage(CloseContainerEventHandler.java:85)

              at org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler.onMessage(CloseContainerEventHandler.java:50)

              at org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85)

              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

              at java.lang.Thread.run(Thread.java:748)

      Attachments

        Activity

          People

            Unassigned Unassigned
            ghuangups George Huang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: