Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-3277

Datanodes do not close pipeline when pipeline directory is deleted.

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      First the pipeline was deleted

      2020-03-25 19:44:22,669 [pool-22-thread-1] INFO  failure.Failures (FailureManager.java:fail(49)) - failing with, DeletePipelineFailure
      2020-03-25 19:44:22,669 [pool-22-thread-1] INFO  failure.Failures (Failures.java:fail(118)) - deleteing pipeline directory /tmp/chaos-2020-03-25-19-42-52-IST/MiniOzoneClusterImpl-ef9b224f-a403-4e9b-a27a-ed38f46700
      c5/datanode-0/data/ratis/c4275846-2a44-4f53-b00d-c95a81785df9
      2020-03-25 19:44:22,679 [pool-22-thread-1] INFO  failure.Failures (Failures.java:fail(118)) - deleteing pipeline directory /tmp/chaos-2020-03-25-19-42-52-IST/MiniOzoneClusterImpl-ef9b224f-a403-4e9b-a27a-ed38f46700
      c5/datanode-3/data/ratis/c4275846-2a44-4f53-b00d-c95a81785df9
      2020-03-25 19:44:22,681 [pool-22-thread-1] INFO  failure.Failures (Failures.java:fail(118)) - deleteing pipeline directory /tmp/chaos-2020-03-25-19-42-52-IST/MiniOzoneClusterImpl-ef9b224f-a403-4e9b-a27a-ed38f46700
      c5/datanode-5/data/ratis/c4275846-2a44-4f53-b00d-c95a81785df9
      

      However no pipeline failure handling was issued to SCM.

      2020-03-25 19:44:24,532 [b5d165bc-d2b3-497c-ae38-10f649674a3f@group-C95A81785DF9-StateMachineUpdater] ERROR ratis.ContainerStateMachine (ContainerStateMachine.java:takeSnapshot(302)) - group-C95A81785DF9: Failed to write snapshot at:(t:1, i:2037) file /tmp/chaos-2020-03-25-19-42-52-IST/MiniOzoneClusterImpl-ef9b224f-a403-4e9b-a27a-ed38f46700c5/datanode-3/data/ratis/c4275846-2a44-4f53-b00d-c95a81785df9/sm/snapshot.1_2037
      2020-03-25 19:44:24,532 [b5d165bc-d2b3-497c-ae38-10f649674a3f@group-C95A81785DF9-StateMachineUpdater] ERROR impl.StateMachineUpdater (StateMachineUpdater.java:takeSnapshot(269)) - b5d165bc-d2b3-497c-ae38-10f649674a3f@group-C95A81785DF9-StateMachineUpdater: Failed to take snapshot
      java.io.FileNotFoundException: /tmp/chaos-2020-03-25-19-42-52-IST/MiniOzoneClusterImpl-ef9b224f-a403-4e9b-a27a-ed38f46700c5/datanode-3/data/ratis/c4275846-2a44-4f53-b00d-c95a81785df9/sm/snapshot.1_2037 (No such file or directory)
              at java.io.FileOutputStream.open0(Native Method)
              at java.io.FileOutputStream.open(FileOutputStream.java:270)
              at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
              at java.io.FileOutputStream.<init>(FileOutputStream.java:162)
              at org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.takeSnapshot(ContainerStateMachine.java:296)
              at org.apache.ratis.server.impl.StateMachineUpdater.takeSnapshot(StateMachineUpdater.java:258)
              at org.apache.ratis.server.impl.StateMachineUpdater.checkAndTakeSnapshot(StateMachineUpdater.java:250)
              at org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:169)
              at java.lang.Thread.run(Thread.java:748)
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            sumitagrawl Sumit Agrawal
            msingh Mukul Kumar Singh
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment