Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-6109

Ozone Client should retry unflushed buffers on new pipeline on GroupMismatch Exception.

    XMLWordPrintableJSON

Details

    Description

      Currently, if the pipeline is closed in between a write the client gets a Mismatch Exception which results in a exception using the client. https://github.com/kerneltime/ozone/blob/a43735eba7a2eea7769ea146a136aebae3b8b84b/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestContainerStateMachineFailures.java#L175-L284

      2021-12-14 14:38:49,683 [Command processor thread] INFO server.RaftServer$Division (ServerState.java:close(419)) - 2d07f9d1-28a1-49bc-a902-d2a1291cbdf1@group-89F59A98FF87: closes. applyIndex: 2
      2021-12-14 14:38:49,683 [2d07f9d1-28a1-49bc-a902-d2a1291cbdf1@group-89F59A98FF87-SegmentedRaftLogWorker] INFO segmented.SegmentedRaftLogWorker (SegmentedRaftLogWorker.java:run(327)) - 2d07f9d1-28a1-49bc-a902-d2a1291cbdf1@group-89F59A98FF87-SegmentedRaftLogWorker was interrupted, exiting. There are 0 tasks remaining in the queue.
      2021-12-14 14:38:49,686 [Command processor thread] INFO segmented.SegmentedRaftLogWorker (SegmentedRaftLogWorker.java:close(237)) - 2d07f9d1-28a1-49bc-a902-d2a1291cbdf1@group-89F59A98FF87-SegmentedRaftLogWorker close()
      2021-12-14 14:38:49,691 [Command processor thread] INFO server.RaftServer$Division (RaftServerImpl.java:groupRemove(382)) - 2d07f9d1-28a1-49bc-a902-d2a1291cbdf1@group-89F59A98FF87: Succeed to remove RaftStorageDirectory Storage Directory /Users/ritesh/IdeaProjects/ozone/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-4ef3409b-a4e4-4564-b417-667c302b8de2/datanode-1/data/ratis/pipelineXXX
      2021-12-14 14:38:49,691 [Command processor thread] INFO commandhandler.ClosePipelineCommandHandler (ClosePipelineCommandHandler.java:handle(78)) - Close Pipeline PipelineID=pipelineXXX command on datanode 2d07f9d1-28a1-49bc-a902-d2a1291cbdf1.
      2021-12-14 14:38:49,728 [EventQueue-PipelineReportForPipelineReportHandler] INFO pipeline.PipelineReportHandler (PipelineReportHandler.java:processPipelineReport(113)) - Reported pipeline PipelineID=pipelineXXX is not found
      2021-12-14 14:38:51,926 [Listener at 127.0.0.1/52003] WARN scm.XceiverClientRatis (XceiverClientRatis.java:watchForCommit(266)) - 3 way commit failed on pipeline Pipeline[ Id: pipelineXXX, Nodes: 8c998abc-6bf8-426d-ae41-6d32c225dbb3\{ip: 192.168.86.246, host: 21884.lan, ports: [REPLICATION=52022, RATIS=52023, RATIS_ADMIN=52023, RATIS_SERVER=52023, STANDALONE=52024], networkLocation: /default-rack, certSerialId: null, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}82f2254c-9af0-4452-9f3a-881c3df8ce31\{ip: 192.168.86.246, host: 21884.lan, ports: [REPLICATION=52016, RATIS=52017, RATIS_ADMIN=52017, RATIS_SERVER=52017, STANDALONE=52018], networkLocation: /default-rack, certSerialId: null, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}2d07f9d1-28a1-49bc-a902-d2a1291cbdf1\{ip: 192.168.86.246, host: 21884.lan, ports: [REPLICATION=52019, RATIS=52020, RATIS_ADMIN=52020, RATIS_SERVER=52020, STANDALONE=52021], networkLocation: /default-rack, certSerialId: null, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}, ReplicationConfig: RATIS/THREE, State:OPEN, leaderId:82f2254c-9af0-4452-9f3a-881c3df8ce31, CreationTimestamp2021-12-14T14:38:39.305-08:00[America/Los_Angeles]]
      java.util.concurrent.ExecutionException: org.apache.ratis.protocol.exceptions.RaftRetryFailureException: Failed RaftClientRequest:client-214E4F4A64F9->8c998abc-6bf8-426d-ae41-6d32c225dbb3@group-89F59A98FF87, cid=37, seq=0, Watch-ALL_COMMITTED(6), null for 2 attempts with RequestTypeDependentRetryPolicy\{WRITE->org.apache.ratis.retry.ExceptionDependentRetry@7754720f, WATCH->org.apache.ratis.retry.ExceptionDependentRetry@514c16e5}
      at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
      at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
      at org.apache.hadoop.hdds.scm.XceiverClientRatis.watchForCommit(XceiverClientRatis.java:263)
      at org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchForCommit(CommitWatcher.java:199)
      at org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchOnLastIndex(CommitWatcher.java:166)
      at org.apache.hadoop.hdds.scm.storage.RatisBlockOutputStream.sendWatchForCommit(RatisBlockOutputStream.java:101)
      at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.watchForCommit(BlockOutputStream.java:373)
      at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFlush(BlockOutputStream.java:533)
      at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.close(BlockOutputStream.java:547)
      at org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.close(BlockOutputStreamEntry.java:137)
      at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleStreamAction(KeyOutputStream.java:495)
      at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:469)
      at org.apache.hadoop.ozone.client.io.KeyOutputStream.close(KeyOutputStream.java:522)
      at org.apache.hadoop.ozone.client.io.OzoneOutputStream.close(OzoneOutputStream.java:61)
      at org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailures.testContainerStateMachineTransitionOnUnhealthyReplicas(TestContainerStateMachineFailures.java:225)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:498)
      at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
      at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
      at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
      at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
      at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
      at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
      at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
      at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
      at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
      at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
      at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
      at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
      at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
      at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
      at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
      at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
      at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
      at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
      at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
      at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69)
      at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
      at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:235)
      at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:54)
      Caused by: org.apache.ratis.protocol.exceptions.RaftRetryFailureException: Failed RaftClientRequest:client-214E4F4A64F9->8c998abc-6bf8-426d-ae41-6d32c225dbb3@group-89F59A98FF87, cid=37, seq=0, Watch-ALL_COMMITTED(6), null for 2 attempts with RequestTypeDependentRetryPolicy\{WRITE->org.apache.ratis.retry.ExceptionDependentRetry@7754720f, WATCH->org.apache.ratis.retry.ExceptionDependentRetry@514c16e5}
      at org.apache.ratis.client.impl.RaftClientImpl.noMoreRetries(RaftClientImpl.java:272)
      

      Attachments

        Issue Links

          Activity

            People

              ritesh Ritesh Shukla
              ritesh Ritesh Shukla
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: