Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
HMaster abruptly crashes down, checked the logs, just before the crash logs like this are there:
java.util.concurrent.CompletionException: org.apache.ratis.protocol.exceptions.StateMachineException: org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException from Server 50954181-a303-4e2f-aca5-c70f235191f1@group-9CCD951DCB08: Container 2061 in CLOSED state
Full related log:
2024-04-22 08:05:26,709 ERROR org.apache.ratis.client.impl.OrderedAsync: Failed to send request, message=cmdType: WriteChunk traceID: "" containerID: 2061 datanodeUuid: "a3535b74-fc72-443e-b66d-cb0da825c469" writeChunk { blockID { containerID: 2061 localID: 113750153625619822 blockCommitSequenceId: 18190201 replicaIndex: 0 } chunkData { chunkName: "113750153625619822_chunk_2699" offset: 1497659 len: 98 checksumData { type: CRC32 bytesPerChecksum: 16384 checksums: "U\321\246\212" } } } encodedToken: "VwoFaGJhc2USJWNvbklEOiAyMDYxIGxvY0lEOiAxMTM3NTAxNTM2MjU2MTk4MjIY5ILNz_AxKAEoAjCAgICAAToWCLud9efY8qvz5QEQlLemrcCwiumPASCvEZ-NqRpFuh6-H1ottQt1_14NiKrfTck8ZuC5FzTX6xBIRERTX0JMT0NLX1RPS0VOLGNvbklEOiAyMDYxIGxvY0lEOiAxMTM3NTAxNTM2MjU2MTk4MjIAAAAAAAAA" , data.size=98 java.util.concurrent.CompletionException: org.apache.ratis.protocol.exceptions.StateMachineException: org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException from Server 50954181-a303-4e2f-aca5-c70f235191f1@group-9CCD951DCB08: Container 2061 in CLOSED state at org.apache.ratis.client.impl.RaftClientImpl.handleRaftException(RaftClientImpl.java:374) at org.apache.ratis.client.impl.OrderedAsync.lambda$send$3(OrderedAsync.java:173) at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616) at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) at org.apache.ratis.client.impl.OrderedAsync$PendingOrderedRequest.setReply(OrderedAsync.java:99) at org.apache.ratis.client.impl.OrderedAsync$PendingOrderedRequest.setReply(OrderedAsync.java:60) at org.apache.ratis.util.SlidingWindow$RequestMap.setReply(SlidingWindow.java:144) at org.apache.ratis.util.SlidingWindow$Client.receiveReply(SlidingWindow.java:348) at org.apache.ratis.client.impl.OrderedAsync.lambda$sendRequest$8(OrderedAsync.java:243) at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616) at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.lambda$onNext$0(GrpcClientProtocolClient.java:322) at java.util.Optional.ifPresent(Optional.java:159) at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.handleReplyFuture(GrpcClientProtocolClient.java:378) at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.access$100(GrpcClientProtocolClient.java:300) at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:322) at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:305) at org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:468) at org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33) at org.apache.ratis.thirdparty.io.grpc.internal.DelayedClientCall$DelayedListener.onMessage(DelayedClientCall.java:473) at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInternal(ClientCallImpl.java:660) at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:647) at org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) at org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.ratis.protocol.exceptions.StateMachineException: org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException from Server 50954181-a303-4e2f-aca5-c70f235191f1@group-9CCD951DCB08: Container 2061 in CLOSED state at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.validateContainerCommand(HddsDispatcher.java:560) at org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.startTransaction(ContainerStateMachine.java:415) at org.apache.ratis.server.impl.RaftServerImpl.writeAsync(RaftServerImpl.java:941) at org.apache.ratis.server.impl.RaftServerImpl.replyFuture(RaftServerImpl.java:919) at org.apache.ratis.server.impl.RaftServerImpl.submitClientRequestAsync(RaftServerImpl.java:896) at org.apache.ratis.server.impl.RaftServerImpl.lambda$null$11(RaftServerImpl.java:885) at org.apache.ratis.util.JavaUtils.callAsUnchecked(JavaUtils.java:117) at org.apache.ratis.server.impl.RaftServerImpl.lambda$executeSubmitClientRequestAsync$12(RaftServerImpl.java:885) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) ... 3 more Caused by: org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException: Container 2061 in CLOSED state at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.ratis.util.ReflectionUtils.instantiateException(ReflectionUtils.java:259) at org.apache.ratis.client.impl.ClientProtoUtils.toStateMachineException(ClientProtoUtils.java:449) at org.apache.ratis.client.impl.ClientProtoUtils.toStateMachineException(ClientProtoUtils.java:435) at org.apache.ratis.client.impl.ClientProtoUtils.toRaftClientReply(ClientProtoUtils.java:402) at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:310) at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:305) at org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:468) at org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33) at org.apache.ratis.thirdparty.io.grpc.internal.DelayedClientCall$DelayedListener.onMessage(DelayedClientCall.java:473) at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInternal(ClientCallImpl.java:660) at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:647) at org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) at org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) ... 3 more 2024-04-22 08:05:26,832 ERROR org.apache.ratis.client.impl.OrderedAsync: Failed to send request, message=cmdType: PutBlock traceID: "" containerID: 2061 datanodeUuid: "a3535b74-fc72-443e-b66d-cb0da825c469" putBlock { blockData { blockID { containerID: 2061 localID: 113750153625619822 blockCommitSequenceId: 0 } metadata { key: "TYPE" value: "KEY" } metadata { key: "incremental" } chunks { chunkName: "113750153625619822_chunk_0" offset: 0 len: 1497757 checksumData { type: CRC32 bytesPerChecksum: 16384 checksums: ".M]\274" checksums: "\341f@\350" checksums: "3\215@\243" checksums: "\027\220|\226" checksums: "xE8B" checksums: ",\300\263\233" checksums: "#\314\246x" checksums: "\313\220\211\362" checksums: "\337P6\004" checksums: "\351\334(\032" checksums: "l\315\005[" checksums: "P\311\212\245" checksums: "\355R\361\235" checksums: "\256\341\206\304" checksums: "x\304\353\322" checksums: "q\257\337\027" checksums: "e\253\304\241" checksums: "Fy`6" checksums: "\351A\221\351" checksums: "\270\243\366T" checksums: "\246\264aN" checksums: "V`\033\003" checksums: " $F\214" . . . checksums: "D5\360\350" checksums: "\360w\314X" checksums: "\350\025\003\263" checksums: "\347\310\334\215" } } } eof: false } encodedToken: "VwoFaGJhc2USJWNvbklEOiAyMDYxIGxvY0lEOiAxMTM3NTAxNTM2MjU2MTk4MjIY5ILNz_AxKAEoAjCAgICAAToWCLud9efY8qvz5QEQlLemrcCwiumPASCvEZ-NqRpFuh6-H1ottQt1_14NiKrfTck8ZuC5FzTX6xBIRERTX0JMT0NLX1RPS0VOLGNvbklEOiAyMDYxIGxvY0lEOiAxMTM3NTAxNTM2MjU2MTk4MjIAAAAAAAAA" , data.size=0 java.util.concurrent.CompletionException: org.apache.ratis.protocol.exceptions.StateMachineException: org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException from Server 50954181-a303-4e2f-aca5-c70f235191f1@group-9CCD951DCB08: Container 2061 in CLOSED state at org.apache.ratis.client.impl.RaftClientImpl.handleRaftException(RaftClientImpl.java:374) at org.apache.ratis.client.impl.OrderedAsync.lambda$send$3(OrderedAsync.java:173) at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616) at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) at org.apache.ratis.client.impl.OrderedAsync$PendingOrderedRequest.setReply(OrderedAsync.java:99) at org.apache.ratis.client.impl.OrderedAsync$PendingOrderedRequest.setReply(OrderedAsync.java:60) at org.apache.ratis.util.SlidingWindow$RequestMap.setReply(SlidingWindow.java:144) at org.apache.ratis.util.SlidingWindow$Client.receiveReply(SlidingWindow.java:348) at org.apache.ratis.client.impl.OrderedAsync.lambda$sendRequest$8(OrderedAsync.java:243) at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616) at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.lambda$onNext$0(GrpcClientProtocolClient.java:322) at java.util.Optional.ifPresent(Optional.java:159) at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.handleReplyFuture(GrpcClientProtocolClient.java:378) at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.access$100(GrpcClientProtocolClient.java:300) at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:322) at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:305) at org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:468) at org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33) at org.apache.ratis.thirdparty.io.grpc.internal.DelayedClientCall$DelayedListener.onMessage(DelayedClientCall.java:473) at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInternal(ClientCallImpl.java:660) at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:647) at org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) at org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.ratis.protocol.exceptions.StateMachineException: org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException from Server 50954181-a303-4e2f-aca5-c70f235191f1@group-9CCD951DCB08: Container 2061 in CLOSED state at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.validateContainerCommand(HddsDispatcher.java:560) at org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.startTransaction(ContainerStateMachine.java:415) at org.apache.ratis.server.impl.RaftServerImpl.writeAsync(RaftServerImpl.java:941) at org.apache.ratis.server.impl.RaftServerImpl.replyFuture(RaftServerImpl.java:919) at org.apache.ratis.server.impl.RaftServerImpl.submitClientRequestAsync(RaftServerImpl.java:896) at org.apache.ratis.server.impl.RaftServerImpl.lambda$null$11(RaftServerImpl.java:885) at org.apache.ratis.util.JavaUtils.callAsUnchecked(JavaUtils.java:117) at org.apache.ratis.server.impl.RaftServerImpl.lambda$executeSubmitClientRequestAsync$12(RaftServerImpl.java:885) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) ... 3 more Caused by: org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException: Container 2061 in CLOSED state at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.ratis.util.ReflectionUtils.instantiateException(ReflectionUtils.java:259) at org.apache.ratis.client.impl.ClientProtoUtils.toStateMachineException(ClientProtoUtils.java:449) at org.apache.ratis.client.impl.ClientProtoUtils.toStateMachineException(ClientProtoUtils.java:435) at org.apache.ratis.client.impl.ClientProtoUtils.toRaftClientReply(ClientProtoUtils.java:402) at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:310) at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:305) at org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:468) at org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33) at org.apache.ratis.thirdparty.io.grpc.internal.DelayedClientCall$DelayedListener.onMessage(DelayedClientCall.java:473) at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInternal(ClientCallImpl.java:660) at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:647) at org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) at org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) ... 3 more 2024-04-22 08:05:30,039 WARN org.apache.ratis.grpc.GrpcUtil: Timed out gracefully shutting down connection: ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=191, target=10.140.52.141:9858}}.
And then the Master goes down:
2024-04-22 08:08:02,026 ERROR org.apache.hadoop.hbase.master.HMaster: ***** ABORTING master ccycloud-7.ozn-hb973chf3oz.xyz,22001,1713770648404: Log rolling failed ***** java.lang.RuntimeException at org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.writeWALMetadata(AsyncProtobufLogWriter.java:217) at org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.writeMagicAndWALHeader(AsyncProtobufLogWriter.java:223) at org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:164) at org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createAsyncWriter(AsyncFSWALProvider.java:116) at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:726) at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:129) at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:886) at org.apache.hadoop.hbase.wal.AbstractWALRoller$RollController.rollWal(AbstractWALRoller.java:304) at org.apache.hadoop.hbase.wal.AbstractWALRoller.run(AbstractWALRoller.java:211) 2024-04-22 08:08:02,034 INFO org.apache.ranger.plugin.util.PolicyRefresher: PolicyRefresher(serviceName=cm_hbase).run(): interrupted! Exiting thread java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.ranger.plugin.util.PolicyRefresher.run(PolicyRefresher.java:208) 2024-04-22 08:08:02,037 INFO org.apache.ranger.audit.provider.AuditProviderFactory: ==> JVMShutdownHook.run()