Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
Under certain circumstances, if the current/VERSION of a storage directory is missing, DataNode may format the storage directory even though block files are not missing.
This is very easy to reproduce. Simply launch a HDFS cluster and create some files. Delete current/VERSION, and restart the data node.
After the restart, the data node will format the directory and remove all existing block files:
2016-05-03 12:57:15,387 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /data/dfs/dn/in_use.lock acquired by nodename 5314@weichiu-dn-2.vpc.cloudera.com 2016-05-03 12:57:15,389 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory /data/dfs/dn is not formatted for BP-787466439-172.26.24.43-1462305406642 2016-05-03 12:57:15,389 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting ... 2016-05-03 12:57:15,464 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-787466439-172.26.24.43-1462305406642 2016-05-03 12:57:15,464 INFO org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled for /data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642 2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: Block pool storage directory /data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642 is not formatted for BP-787466439-172 .26.24.43-1462305406642 2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting ... 2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting block pool BP-787466439-172.26.24.43-1462305406642 directory /data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642/current
The bug is: DataNode assumes that if none of current/VERSION, previous/, previous.tmp/, removed.tmp/, finalized.tmp/ and lastcheckpoint.tmp/ exists, the storage directory contains nothing important to HDFS and decides to format it.
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java#L526-L545
However, block files may still exist, and in my opinion, we should do everything possible to retain the block files.
I have two suggestions:
- check if current/ directory is empty. If not, throw an InconsistentFSStateException in Storage#analyzeStorage instead of asumming its not formatted. Or,
- In Storage#clearDirectory, before it formats the storage directory, rename or move current/ directory. Also, log whatever is being renamed/moved.
Attachments
Attachments
Issue Links
- relates to
-
HDFS-11112 Journal Nodes should refuse to format non-empty directories
- Resolved