Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Cannot Reproduce
-
0.18.2
-
None
-
None
-
None
Description
After restarting a cluster (including rebooting) the dfs got corrupted because many DataNodes did not start up, running into the following exception:
2009-02-26 22:33:53,774 ERROR org.apache.hadoop.dfs.DataNode: org.apache.hadoop.dfs.InconsistentFSStateException: Directory xxx is in an inconsistent state: version file in current directory is missing.
at org.apache.hadoop.dfs.Storage$StorageDirectory.analyzeStorage(Storage.java:326)
at org.apache.hadoop.dfs.DataStorage.recoverTransitionRead(DataStorage.java:105)
at org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:306)
at org.apache.hadoop.dfs.DataNode.<init>(DataNode.java:223)
at org.apache.hadoop.dfs.DataNode.makeInstance(DataNode.java:3030)
at org.apache.hadoop.dfs.DataNode.instantiateDataNode(DataNode.java:2985)
at org.apache.hadoop.dfs.DataNode.createDataNode(DataNode.java:2993)
at org.apache.hadoop.dfs.DataNode.main(DataNode.java:3115)
This happens when using multiple disks with at least one previously marked as read-only, such that the storage version became out-dated, but after reboot it was mounted read-write, resulting in the DataNode not starting because of out-dated version.
This is a big headache. If a DataNode has multiple disks of which at least one has the correct storage version then out-dated versions should not bring down the DataNode.