Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Later
-
None
-
None
Description
ContainerNotOpen exception exception is thrown by datanode when client is writing to a non open container. Currently ozone client sees this as failure and would increment the retry count. If client reaches a configured retry count it fails the write. Map reduce jobs were seen failing due to this error with default retry count of 5.
Idea is to not consider errors due to closed container in retry count. This would make sure that ozone client writes do not fail due to closed container exceptions.
2020-05-15 02:20:28,375 ERROR [main] org.apache.hadoop.ozone.client.io.KeyOutputStream: Retry request failed. retries get failed due to exceeded maximum allowed retries number: 5 java.io.IOException: Unexpected Storage Container Exception: java.util.concurrent.CompletionException: java.util.concurrent.CompletionException: org.apache.ratis.protocol.StateMachineException: org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException from Server e2eec12f-02c5-46e2-9c23-14d6445db219@group-A3BF3ABDC307: Container 15 in CLOSED state at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.setIoException(BlockOutputStream.java:551) at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$3(BlockOutputStream.java:638) at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:884) at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:866) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) at org.apache.ratis.client.impl.OrderedAsync$PendingOrderedRequest.setReply(OrderedAsync.java:99) at org.apache.ratis.client.impl.OrderedAsync$PendingOrderedRequest.setReply(OrderedAsync.java:60) at org.apache.ratis.util.SlidingWindow$RequestMap.setReply(SlidingWindow.java:143) at org.apache.ratis.util.SlidingWindow$Client.receiveReply(SlidingWindow.java:314) at org.apache.ratis.client.impl.OrderedAsync.lambda$sendRequest$9(OrderedAsync.java:242) at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616) at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.lambda$onNext$0(GrpcClientProtocolClient.java:284) at java.util.Optional.ifPresent(Optional.java:159) at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.handleReplyFuture(GrpcClientProtocolClient.java:340) at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.access$100(GrpcClientProtocolClient.java:264) at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:284) at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:267) at org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:436) at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInternal(ClientCallImpl.java:658) ...
Attachments
Issue Links
- links to