Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
None
Description
Datanode starts but runs in a tight loop forever if it cannot create the DataNode ID directory e.g. due to permissions issues. I encountered this by having a typo in my ozone-site.xml for ozone.scm.datanode.id.
In just a few minutes the DataNode had generated over 20GB of log+out files with the following exception:
2018-09-12 17:28:20,649 WARN org.apache.hadoop.util.concurrent.ExecutorHelper: Caught exception in thread Datanode State Machine Thread - 2 63: java.io.IOException: Unable to create datanode ID directories. at org.apache.hadoop.ozone.container.common.helpers.ContainerUtils.writeDatanodeDetailsTo(ContainerUtils.java:211) at org.apache.hadoop.ozone.container.common.states.datanode.InitDatanodeState.persistContainerDatanodeDetails(InitDatanodeState.java:131) at org.apache.hadoop.ozone.container.common.states.datanode.InitDatanodeState.call(InitDatanodeState.java:111) at org.apache.hadoop.ozone.container.common.states.datanode.InitDatanodeState.call(InitDatanodeState.java:50) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2018-09-12 17:28:20,648 WARN org.apache.hadoop.util.concurrent.ExecutorHelper: Execution exception when running task in Datanode State Mach ine Thread - 160 2018-09-12 17:28:20,650 WARN org.apache.hadoop.util.concurrent.ExecutorHelper: Caught exception in thread Datanode State Machine Thread - 1 60: java.io.IOException: Unable to create datanode ID directories. at org.apache.hadoop.ozone.container.common.helpers.ContainerUtils.writeDatanodeDetailsTo(ContainerUtils.java:211) at org.apache.hadoop.ozone.container.common.states.datanode.InitDatanodeState.persistContainerDatanodeDetails(InitDatanodeState.java:131) at org.apache.hadoop.ozone.container.common.states.datanode.InitDatanodeState.call(InitDatanodeState.java:111) at org.apache.hadoop.ozone.container.common.states.datanode.InitDatanodeState.call(InitDatanodeState.java:50) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
We should just exit since this is a fatal issue.