Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.1.0
-
None
Description
Took me some time to debug a trivial bug.
DataNode crashes after this mysterious error and no explanation:
10:11:44.382 PM INFO MutableVolumeSet Moving Volume : /var/lib/hadoop-ozone/fake_datanode/data/hdds to failed Volumes 10:11:46.287 PM ERROR StateContext Critical error occurred in StateMachine, setting shutDownMachine 10:11:46.287 PM ERROR DatanodeStateMachine DatanodeStateMachine Shutdown due to an critical error
Turns out that if there are unexpected files under the hdds directory ($hdds.datanode.dir/hdds), DN thinks the volume is bad and move it to failed volume list, without an error explanation. I was editing the VERSION file and vim created a temp file under the directory. This is impossible to debug without reading the code.
HddsVolumeUtil#checkVolume()
} else if(hddsFiles.length == 2) { // The files should be Version and SCM directory if (scmDir.exists()) { return true; } else { logger.error("Volume {} is in Inconsistent state, expected scm " + "directory {} does not exist", volumeRoot, scmDir .getAbsolutePath()); return false; } } else { // The hdds root dir should always have 2 files. One is Version file // and other is SCM directory. <---- HERE! return false; }
Attachments
Issue Links
- links to