Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-3611

Ozone client should not consider closed container error as failure

    XMLWordPrintableJSON

    Details

    • Target Version/s:

      Description

      ContainerNotOpen exception exception is thrown by datanode when client is writing to a non open container. Currently ozone client sees this as failure and would increment the retry count. If client reaches a configured retry count it fails the write. Map reduce jobs were seen failing due to this error with default retry count of 5.

      Idea is to not consider errors due to closed container in retry count. This would make sure that ozone client writes do not fail due to closed container exceptions.

      2020-05-15 02:20:28,375 ERROR [main] org.apache.hadoop.ozone.client.io.KeyOutputStream: Retry request failed. retries get failed due to exceeded maximum allowed retries number: 5
      java.io.IOException: Unexpected Storage Container Exception: java.util.concurrent.CompletionException: java.util.concurrent.CompletionException: org.apache.ratis.protocol.StateMachineException: org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException from Server e2eec12f-02c5-46e2-9c23-14d6445db219@group-A3BF3ABDC307: Container 15 in CLOSED state
              at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.setIoException(BlockOutputStream.java:551)
              at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$3(BlockOutputStream.java:638)
              at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:884)
              at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:866)
              at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
              at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
              at org.apache.ratis.client.impl.OrderedAsync$PendingOrderedRequest.setReply(OrderedAsync.java:99)
              at org.apache.ratis.client.impl.OrderedAsync$PendingOrderedRequest.setReply(OrderedAsync.java:60)
              at org.apache.ratis.util.SlidingWindow$RequestMap.setReply(SlidingWindow.java:143)
              at org.apache.ratis.util.SlidingWindow$Client.receiveReply(SlidingWindow.java:314)
              at org.apache.ratis.client.impl.OrderedAsync.lambda$sendRequest$9(OrderedAsync.java:242)
              at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
              at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
              at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
              at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
              at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.lambda$onNext$0(GrpcClientProtocolClient.java:284)
              at java.util.Optional.ifPresent(Optional.java:159)
              at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.handleReplyFuture(GrpcClientProtocolClient.java:340)
              at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.access$100(GrpcClientProtocolClient.java:264)
              at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:284)
              at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:267)
              at org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:436)
              at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInternal(ClientCallImpl.java:658)
      ...

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ljain Lokesh Jain
                Reporter:
                ljain Lokesh Jain
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: