Uploaded image for project: 'Hadoop Distributed Data Store'
  1. Hadoop Distributed Data Store
  2. HDDS-3277

Datanodes do not close pipeline when pipeline directory is deleted.

    XMLWordPrintableJSON

    Details

    • Target Version/s:

      Description

      First the pipeline was deleted

      2020-03-25 19:44:22,669 [pool-22-thread-1] INFO  failure.Failures (FailureManager.java:fail(49)) - failing with, DeletePipelineFailure
      2020-03-25 19:44:22,669 [pool-22-thread-1] INFO  failure.Failures (Failures.java:fail(118)) - deleteing pipeline directory /tmp/chaos-2020-03-25-19-42-52-IST/MiniOzoneClusterImpl-ef9b224f-a403-4e9b-a27a-ed38f46700
      c5/datanode-0/data/ratis/c4275846-2a44-4f53-b00d-c95a81785df9
      2020-03-25 19:44:22,679 [pool-22-thread-1] INFO  failure.Failures (Failures.java:fail(118)) - deleteing pipeline directory /tmp/chaos-2020-03-25-19-42-52-IST/MiniOzoneClusterImpl-ef9b224f-a403-4e9b-a27a-ed38f46700
      c5/datanode-3/data/ratis/c4275846-2a44-4f53-b00d-c95a81785df9
      2020-03-25 19:44:22,681 [pool-22-thread-1] INFO  failure.Failures (Failures.java:fail(118)) - deleteing pipeline directory /tmp/chaos-2020-03-25-19-42-52-IST/MiniOzoneClusterImpl-ef9b224f-a403-4e9b-a27a-ed38f46700
      c5/datanode-5/data/ratis/c4275846-2a44-4f53-b00d-c95a81785df9
      

      However no pipeline failure handling was issued to SCM.

      2020-03-25 19:44:24,532 [b5d165bc-d2b3-497c-ae38-10f649674a3f@group-C95A81785DF9-StateMachineUpdater] ERROR ratis.ContainerStateMachine (ContainerStateMachine.java:takeSnapshot(302)) - group-C95A81785DF9: Failed to write snapshot at:(t:1, i:2037) file /tmp/chaos-2020-03-25-19-42-52-IST/MiniOzoneClusterImpl-ef9b224f-a403-4e9b-a27a-ed38f46700c5/datanode-3/data/ratis/c4275846-2a44-4f53-b00d-c95a81785df9/sm/snapshot.1_2037
      2020-03-25 19:44:24,532 [b5d165bc-d2b3-497c-ae38-10f649674a3f@group-C95A81785DF9-StateMachineUpdater] ERROR impl.StateMachineUpdater (StateMachineUpdater.java:takeSnapshot(269)) - b5d165bc-d2b3-497c-ae38-10f649674a3f@group-C95A81785DF9-StateMachineUpdater: Failed to take snapshot
      java.io.FileNotFoundException: /tmp/chaos-2020-03-25-19-42-52-IST/MiniOzoneClusterImpl-ef9b224f-a403-4e9b-a27a-ed38f46700c5/datanode-3/data/ratis/c4275846-2a44-4f53-b00d-c95a81785df9/sm/snapshot.1_2037 (No such file or directory)
              at java.io.FileOutputStream.open0(Native Method)
              at java.io.FileOutputStream.open(FileOutputStream.java:270)
              at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
              at java.io.FileOutputStream.<init>(FileOutputStream.java:162)
              at org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.takeSnapshot(ContainerStateMachine.java:296)
              at org.apache.ratis.server.impl.StateMachineUpdater.takeSnapshot(StateMachineUpdater.java:258)
              at org.apache.ratis.server.impl.StateMachineUpdater.checkAndTakeSnapshot(StateMachineUpdater.java:250)
              at org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:169)
              at java.lang.Thread.run(Thread.java:748)
      

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              msingh Mukul Kumar Singh
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: