Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-3611

Ozone client should not consider closed container error as failure

    XMLWordPrintableJSON

Details

    Description

      ContainerNotOpen exception exception is thrown by datanode when client is writing to a non open container. Currently ozone client sees this as failure and would increment the retry count. If client reaches a configured retry count it fails the write. Map reduce jobs were seen failing due to this error with default retry count of 5.

      Idea is to not consider errors due to closed container in retry count. This would make sure that ozone client writes do not fail due to closed container exceptions.

      2020-05-15 02:20:28,375 ERROR [main] org.apache.hadoop.ozone.client.io.KeyOutputStream: Retry request failed. retries get failed due to exceeded maximum allowed retries number: 5
      java.io.IOException: Unexpected Storage Container Exception: java.util.concurrent.CompletionException: java.util.concurrent.CompletionException: org.apache.ratis.protocol.StateMachineException: org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException from Server e2eec12f-02c5-46e2-9c23-14d6445db219@group-A3BF3ABDC307: Container 15 in CLOSED state
              at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.setIoException(BlockOutputStream.java:551)
              at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$3(BlockOutputStream.java:638)
              at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:884)
              at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:866)
              at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
              at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
              at org.apache.ratis.client.impl.OrderedAsync$PendingOrderedRequest.setReply(OrderedAsync.java:99)
              at org.apache.ratis.client.impl.OrderedAsync$PendingOrderedRequest.setReply(OrderedAsync.java:60)
              at org.apache.ratis.util.SlidingWindow$RequestMap.setReply(SlidingWindow.java:143)
              at org.apache.ratis.util.SlidingWindow$Client.receiveReply(SlidingWindow.java:314)
              at org.apache.ratis.client.impl.OrderedAsync.lambda$sendRequest$9(OrderedAsync.java:242)
              at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
              at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
              at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
              at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
              at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.lambda$onNext$0(GrpcClientProtocolClient.java:284)
              at java.util.Optional.ifPresent(Optional.java:159)
              at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.handleReplyFuture(GrpcClientProtocolClient.java:340)
              at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.access$100(GrpcClientProtocolClient.java:264)
              at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:284)
              at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:267)
              at org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:436)
              at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInternal(ClientCallImpl.java:658)
      ...

      Attachments

        Issue Links

          Activity

            People

              ljain Lokesh Jain
              ljain Lokesh Jain
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: