Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-440

Datanode loops forever if it cannot create directories

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • None
    • 0.2.1
    • Ozone Datanode

    Description

      Datanode starts but runs in a tight loop forever if it cannot create the DataNode ID directory e.g. due to permissions issues. I encountered this by having a typo in my ozone-site.xml for ozone.scm.datanode.id.

      In just a few minutes the DataNode had generated over 20GB of log+out files with the following exception:

      2018-09-12 17:28:20,649 WARN org.apache.hadoop.util.concurrent.ExecutorHelper: Caught exception in thread Datanode State Machine Thread - 2
      63:
      java.io.IOException: Unable to create datanode ID directories.
      at org.apache.hadoop.ozone.container.common.helpers.ContainerUtils.writeDatanodeDetailsTo(ContainerUtils.java:211)
      at org.apache.hadoop.ozone.container.common.states.datanode.InitDatanodeState.persistContainerDatanodeDetails(InitDatanodeState.java:131)
      at org.apache.hadoop.ozone.container.common.states.datanode.InitDatanodeState.call(InitDatanodeState.java:111)
      at org.apache.hadoop.ozone.container.common.states.datanode.InitDatanodeState.call(InitDatanodeState.java:50)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:748)
      2018-09-12 17:28:20,648 WARN org.apache.hadoop.util.concurrent.ExecutorHelper: Execution exception when running task in Datanode State Mach
      ine Thread - 160
      2018-09-12 17:28:20,650 WARN org.apache.hadoop.util.concurrent.ExecutorHelper: Caught exception in thread Datanode State Machine Thread - 1
      60:
      java.io.IOException: Unable to create datanode ID directories.
      at org.apache.hadoop.ozone.container.common.helpers.ContainerUtils.writeDatanodeDetailsTo(ContainerUtils.java:211)
      at org.apache.hadoop.ozone.container.common.states.datanode.InitDatanodeState.persistContainerDatanodeDetails(InitDatanodeState.java:131)
      at org.apache.hadoop.ozone.container.common.states.datanode.InitDatanodeState.call(InitDatanodeState.java:111)
      at org.apache.hadoop.ozone.container.common.states.datanode.InitDatanodeState.call(InitDatanodeState.java:50)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:748)

      We should just exit since this is a fatal issue.

      Attachments

        1. HDDS-440.00.patch
          4 kB
          Bharat Viswanadham

        Activity

          People

            bharat Bharat Viswanadham
            arp Arpit Agarwal
            Votes:
            2 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: