Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-10740

[Hbase-Ozone] HMaster down due to "ContainerNotOpenException: Container in CLOSED state"

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Ozone Datanode, SCM
    • None

    Description

      HMaster abruptly crashes down, checked the logs, just before the crash logs like this are there:

      java.util.concurrent.CompletionException: org.apache.ratis.protocol.exceptions.StateMachineException: org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException from Server 50954181-a303-4e2f-aca5-c70f235191f1@group-9CCD951DCB08: Container 2061 in CLOSED state 

      Full related log:

      2024-04-22 08:05:26,709 ERROR org.apache.ratis.client.impl.OrderedAsync: Failed to send request, message=cmdType: WriteChunk
      traceID: ""
      containerID: 2061
      datanodeUuid: "a3535b74-fc72-443e-b66d-cb0da825c469"
      writeChunk {
        blockID {
          containerID: 2061
          localID: 113750153625619822
          blockCommitSequenceId: 18190201
          replicaIndex: 0
        }
        chunkData {
          chunkName: "113750153625619822_chunk_2699"
          offset: 1497659
          len: 98
          checksumData {
            type: CRC32
            bytesPerChecksum: 16384
            checksums: "U\321\246\212"
          }
        }
      }
      encodedToken: "VwoFaGJhc2USJWNvbklEOiAyMDYxIGxvY0lEOiAxMTM3NTAxNTM2MjU2MTk4MjIY5ILNz_AxKAEoAjCAgICAAToWCLud9efY8qvz5QEQlLemrcCwiumPASCvEZ-NqRpFuh6-H1ottQt1_14NiKrfTck8ZuC5FzTX6xBIRERTX0JMT0NLX1RPS0VOLGNvbklEOiAyMDYxIGxvY0lEOiAxMTM3NTAxNTM2MjU2MTk4MjIAAAAAAAAA"
      , data.size=98
      java.util.concurrent.CompletionException: org.apache.ratis.protocol.exceptions.StateMachineException: org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException from Server 50954181-a303-4e2f-aca5-c70f235191f1@group-9CCD951DCB08: Container 2061 in CLOSED state
              at org.apache.ratis.client.impl.RaftClientImpl.handleRaftException(RaftClientImpl.java:374)
              at org.apache.ratis.client.impl.OrderedAsync.lambda$send$3(OrderedAsync.java:173)
              at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
              at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
              at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
              at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
              at org.apache.ratis.client.impl.OrderedAsync$PendingOrderedRequest.setReply(OrderedAsync.java:99)
              at org.apache.ratis.client.impl.OrderedAsync$PendingOrderedRequest.setReply(OrderedAsync.java:60)
              at org.apache.ratis.util.SlidingWindow$RequestMap.setReply(SlidingWindow.java:144)
              at org.apache.ratis.util.SlidingWindow$Client.receiveReply(SlidingWindow.java:348)
              at org.apache.ratis.client.impl.OrderedAsync.lambda$sendRequest$8(OrderedAsync.java:243)
              at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
              at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
              at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
              at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
              at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.lambda$onNext$0(GrpcClientProtocolClient.java:322)
              at java.util.Optional.ifPresent(Optional.java:159)
              at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.handleReplyFuture(GrpcClientProtocolClient.java:378)
              at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.access$100(GrpcClientProtocolClient.java:300)
              at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:322)
              at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:305)
              at org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:468)
              at org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
              at org.apache.ratis.thirdparty.io.grpc.internal.DelayedClientCall$DelayedListener.onMessage(DelayedClientCall.java:473)
              at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInternal(ClientCallImpl.java:660)
              at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:647)
              at org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
              at org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      Caused by: org.apache.ratis.protocol.exceptions.StateMachineException: org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException from Server 50954181-a303-4e2f-aca5-c70f235191f1@group-9CCD951DCB08: Container 2061 in CLOSED state
              at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.validateContainerCommand(HddsDispatcher.java:560)
              at org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.startTransaction(ContainerStateMachine.java:415)
              at org.apache.ratis.server.impl.RaftServerImpl.writeAsync(RaftServerImpl.java:941)
              at org.apache.ratis.server.impl.RaftServerImpl.replyFuture(RaftServerImpl.java:919)
              at org.apache.ratis.server.impl.RaftServerImpl.submitClientRequestAsync(RaftServerImpl.java:896)
              at org.apache.ratis.server.impl.RaftServerImpl.lambda$null$11(RaftServerImpl.java:885)
              at org.apache.ratis.util.JavaUtils.callAsUnchecked(JavaUtils.java:117)
              at org.apache.ratis.server.impl.RaftServerImpl.lambda$executeSubmitClientRequestAsync$12(RaftServerImpl.java:885)
              at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
              ... 3 more
      Caused by: org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException: Container 2061 in CLOSED state
              at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
              at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
              at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
              at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
              at org.apache.ratis.util.ReflectionUtils.instantiateException(ReflectionUtils.java:259)
              at org.apache.ratis.client.impl.ClientProtoUtils.toStateMachineException(ClientProtoUtils.java:449)
              at org.apache.ratis.client.impl.ClientProtoUtils.toStateMachineException(ClientProtoUtils.java:435)
              at org.apache.ratis.client.impl.ClientProtoUtils.toRaftClientReply(ClientProtoUtils.java:402)
              at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:310)
              at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:305)
              at org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:468)
              at org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
              at org.apache.ratis.thirdparty.io.grpc.internal.DelayedClientCall$DelayedListener.onMessage(DelayedClientCall.java:473)
              at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInternal(ClientCallImpl.java:660)
              at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:647)
              at org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
              at org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
              ... 3 more
      2024-04-22 08:05:26,832 ERROR org.apache.ratis.client.impl.OrderedAsync: Failed to send request, message=cmdType: PutBlock
      traceID: ""
      containerID: 2061
      datanodeUuid: "a3535b74-fc72-443e-b66d-cb0da825c469"
      putBlock {
        blockData {
          blockID {
            containerID: 2061
            localID: 113750153625619822
            blockCommitSequenceId: 0
          }
          metadata {
            key: "TYPE"
            value: "KEY"
          }
          metadata {
            key: "incremental"
          }
          chunks {
            chunkName: "113750153625619822_chunk_0"
            offset: 0
            len: 1497757
            checksumData {
              type: CRC32
              bytesPerChecksum: 16384
              checksums: ".M]\274"
              checksums: "\341f@\350"
              checksums: "3\215@\243"
              checksums: "\027\220|\226"
              checksums: "xE8B"
              checksums: ",\300\263\233"
              checksums: "#\314\246x"
              checksums: "\313\220\211\362"
              checksums: "\337P6\004"
              checksums: "\351\334(\032"
              checksums: "l\315\005["
              checksums: "P\311\212\245"
              checksums: "\355R\361\235"
              checksums: "\256\341\206\304"
              checksums: "x\304\353\322"
              checksums: "q\257\337\027"
              checksums: "e\253\304\241"
              checksums: "Fy`6"
              checksums: "\351A\221\351"
              checksums: "\270\243\366T"
              checksums: "\246\264aN"
              checksums: "V`\033\003"
              checksums: " $F\214"
      .
      .
      .
              checksums: "D5\360\350"
              checksums: "\360w\314X"
              checksums: "\350\025\003\263"
              checksums: "\347\310\334\215"
            }
          }
        }
        eof: false
      }
      encodedToken: "VwoFaGJhc2USJWNvbklEOiAyMDYxIGxvY0lEOiAxMTM3NTAxNTM2MjU2MTk4MjIY5ILNz_AxKAEoAjCAgICAAToWCLud9efY8qvz5QEQlLemrcCwiumPASCvEZ-NqRpFuh6-H1ottQt1_14NiKrfTck8ZuC5FzTX6xBIRERTX0JMT0NLX1RPS0VOLGNvbklEOiAyMDYxIGxvY0lEOiAxMTM3NTAxNTM2MjU2MTk4MjIAAAAAAAAA"
      , data.size=0
      java.util.concurrent.CompletionException: org.apache.ratis.protocol.exceptions.StateMachineException: org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException from Server 50954181-a303-4e2f-aca5-c70f235191f1@group-9CCD951DCB08: Container 2061 in CLOSED state
              at org.apache.ratis.client.impl.RaftClientImpl.handleRaftException(RaftClientImpl.java:374)
              at org.apache.ratis.client.impl.OrderedAsync.lambda$send$3(OrderedAsync.java:173)
              at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
              at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
              at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
              at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
              at org.apache.ratis.client.impl.OrderedAsync$PendingOrderedRequest.setReply(OrderedAsync.java:99)
              at org.apache.ratis.client.impl.OrderedAsync$PendingOrderedRequest.setReply(OrderedAsync.java:60)
              at org.apache.ratis.util.SlidingWindow$RequestMap.setReply(SlidingWindow.java:144)
              at org.apache.ratis.util.SlidingWindow$Client.receiveReply(SlidingWindow.java:348)
              at org.apache.ratis.client.impl.OrderedAsync.lambda$sendRequest$8(OrderedAsync.java:243)
              at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
              at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
              at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
              at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
              at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.lambda$onNext$0(GrpcClientProtocolClient.java:322)
              at java.util.Optional.ifPresent(Optional.java:159)
              at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.handleReplyFuture(GrpcClientProtocolClient.java:378)
              at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.access$100(GrpcClientProtocolClient.java:300)
              at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:322)
              at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:305)
              at org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:468)
              at org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
              at org.apache.ratis.thirdparty.io.grpc.internal.DelayedClientCall$DelayedListener.onMessage(DelayedClientCall.java:473)
              at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInternal(ClientCallImpl.java:660)
              at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:647)
              at org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
              at org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      Caused by: org.apache.ratis.protocol.exceptions.StateMachineException: org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException from Server 50954181-a303-4e2f-aca5-c70f235191f1@group-9CCD951DCB08: Container 2061 in CLOSED state
              at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.validateContainerCommand(HddsDispatcher.java:560)
              at org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.startTransaction(ContainerStateMachine.java:415)
              at org.apache.ratis.server.impl.RaftServerImpl.writeAsync(RaftServerImpl.java:941)
              at org.apache.ratis.server.impl.RaftServerImpl.replyFuture(RaftServerImpl.java:919)
              at org.apache.ratis.server.impl.RaftServerImpl.submitClientRequestAsync(RaftServerImpl.java:896)
              at org.apache.ratis.server.impl.RaftServerImpl.lambda$null$11(RaftServerImpl.java:885)
              at org.apache.ratis.util.JavaUtils.callAsUnchecked(JavaUtils.java:117)
              at org.apache.ratis.server.impl.RaftServerImpl.lambda$executeSubmitClientRequestAsync$12(RaftServerImpl.java:885)
              at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
              ... 3 more
      Caused by: org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException: Container 2061 in CLOSED state
              at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
              at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
              at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
              at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
              at org.apache.ratis.util.ReflectionUtils.instantiateException(ReflectionUtils.java:259)
              at org.apache.ratis.client.impl.ClientProtoUtils.toStateMachineException(ClientProtoUtils.java:449)
              at org.apache.ratis.client.impl.ClientProtoUtils.toStateMachineException(ClientProtoUtils.java:435)
              at org.apache.ratis.client.impl.ClientProtoUtils.toRaftClientReply(ClientProtoUtils.java:402)
              at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:310)
              at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:305)
              at org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:468)
              at org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
              at org.apache.ratis.thirdparty.io.grpc.internal.DelayedClientCall$DelayedListener.onMessage(DelayedClientCall.java:473)
              at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInternal(ClientCallImpl.java:660)
              at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:647)
              at org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
              at org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
              ... 3 more
      2024-04-22 08:05:30,039 WARN org.apache.ratis.grpc.GrpcUtil: Timed out gracefully shutting down connection: ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=191, target=10.140.52.141:9858}}. 

      And then the Master goes down:

      2024-04-22 08:08:02,026 ERROR org.apache.hadoop.hbase.master.HMaster: ***** ABORTING master ccycloud-7.ozn-hb973chf3oz.xyz,22001,1713770648404: Log rolling failed *****
      java.lang.RuntimeException
              at org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.writeWALMetadata(AsyncProtobufLogWriter.java:217)
              at org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.writeMagicAndWALHeader(AsyncProtobufLogWriter.java:223)
              at org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:164)
              at org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createAsyncWriter(AsyncFSWALProvider.java:116)
              at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:726)
              at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:129)
              at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:886)
              at org.apache.hadoop.hbase.wal.AbstractWALRoller$RollController.rollWal(AbstractWALRoller.java:304)
              at org.apache.hadoop.hbase.wal.AbstractWALRoller.run(AbstractWALRoller.java:211)
      2024-04-22 08:08:02,034 INFO org.apache.ranger.plugin.util.PolicyRefresher: PolicyRefresher(serviceName=cm_hbase).run(): interrupted! Exiting thread
      java.lang.InterruptedException
              at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
              at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
              at org.apache.ranger.plugin.util.PolicyRefresher.run(PolicyRefresher.java:208)
      2024-04-22 08:08:02,037 INFO org.apache.ranger.audit.provider.AuditProviderFactory: ==> JVMShutdownHook.run() 

      Attachments

        Activity

          People

            Unassigned Unassigned
            pratyush.bhatt Pratyush Bhatt
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: