Under certain circumstances, if the current/VERSION of a storage directory is missing, DataNode may format the storage directory even though block files are not missing.
This is very easy to reproduce. Simply launch a HDFS cluster and create some files. Delete current/VERSION, and restart the data node.
After the restart, the data node will format the directory and remove all existing block files:
The bug is: DataNode assumes that if none of current/VERSION, previous/, previous.tmp/, removed.tmp/, finalized.tmp/ and lastcheckpoint.tmp/ exists, the storage directory contains nothing important to HDFS and decides to format it.
However, block files may still exist, and in my opinion, we should do everything possible to retain the block files.
I have two suggestions:
- check if current/ directory is empty. If not, throw an InconsistentFSStateException in Storage#analyzeStorage instead of asumming its not formatted. Or,
- In Storage#clearDirectory, before it formats the storage directory, rename or move current/ directory. Also, log whatever is being renamed/moved.