Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10360

DataNode may format directory and lose blocks if current/VERSION is missing




      Under certain circumstances, if the current/VERSION of a storage directory is missing, DataNode may format the storage directory even though block files are not missing.

      This is very easy to reproduce. Simply launch a HDFS cluster and create some files. Delete current/VERSION, and restart the data node.

      After the restart, the data node will format the directory and remove all existing block files:

      2016-05-03 12:57:15,387 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /data/dfs/dn/in_use.lock acquired by nodename 5314@weichiu-dn-2.vpc.cloudera.com
      2016-05-03 12:57:15,389 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory /data/dfs/dn is not formatted for BP-787466439-
      2016-05-03 12:57:15,389 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting ...
      2016-05-03 12:57:15,464 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-787466439-
      2016-05-03 12:57:15,464 INFO org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled for /data/dfs/dn/current/BP-787466439-
      2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: Block pool storage directory /data/dfs/dn/current/BP-787466439- is not formatted for BP-787466439-172
      2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting ...
      2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting block pool BP-787466439- directory /data/dfs/dn/current/BP-787466439-

      The bug is: DataNode assumes that if none of current/VERSION, previous/, previous.tmp/, removed.tmp/, finalized.tmp/ and lastcheckpoint.tmp/ exists, the storage directory contains nothing important to HDFS and decides to format it.
      However, block files may still exist, and in my opinion, we should do everything possible to retain the block files.

      I have two suggestions:

      1. check if current/ directory is empty. If not, throw an InconsistentFSStateException in Storage#analyzeStorage instead of asumming its not formatted. Or,
      2. In Storage#clearDirectory, before it formats the storage directory, rename or move current/ directory. Also, log whatever is being renamed/moved.


        1. HDFS-10360.001.patch
          3 kB
          Wei-Chiu Chuang
        2. HDFS-10360.002.patch
          16 kB
          Wei-Chiu Chuang
        3. HDFS-10360.003.patch
          4 kB
          Wei-Chiu Chuang
        4. HDFS-10360.004.patch
          7 kB
          Wei-Chiu Chuang
        5. HDFS-10360.004.patch
          7 kB
          Wei-Chiu Chuang
        6. HDFS-10360.005.patch
          8 kB
          Wei-Chiu Chuang
        7. HDFS-10360.007.patch
          10 kB
          Wei-Chiu Chuang

        Issue Links



              weichiu Wei-Chiu Chuang
              weichiu Wei-Chiu Chuang
              0 Vote for this issue
              10 Start watching this issue