Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10360

DataNode may format directory and lose blocks if current/VERSION is missing

    XMLWordPrintableJSON

Details

    Description

      Under certain circumstances, if the current/VERSION of a storage directory is missing, DataNode may format the storage directory even though block files are not missing.

      This is very easy to reproduce. Simply launch a HDFS cluster and create some files. Delete current/VERSION, and restart the data node.

      After the restart, the data node will format the directory and remove all existing block files:

      2016-05-03 12:57:15,387 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /data/dfs/dn/in_use.lock acquired by nodename 5314@weichiu-dn-2.vpc.cloudera.com
      2016-05-03 12:57:15,389 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory /data/dfs/dn is not formatted for BP-787466439-172.26.24.43-1462305406642
      2016-05-03 12:57:15,389 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting ...
      2016-05-03 12:57:15,464 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-787466439-172.26.24.43-1462305406642
      2016-05-03 12:57:15,464 INFO org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled for /data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642
      2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: Block pool storage directory /data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642 is not formatted for BP-787466439-172
      .26.24.43-1462305406642
      2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting ...
      2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting block pool BP-787466439-172.26.24.43-1462305406642 directory /data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642/current
      

      The bug is: DataNode assumes that if none of current/VERSION, previous/, previous.tmp/, removed.tmp/, finalized.tmp/ and lastcheckpoint.tmp/ exists, the storage directory contains nothing important to HDFS and decides to format it.
      https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java#L526-L545
      However, block files may still exist, and in my opinion, we should do everything possible to retain the block files.

      I have two suggestions:

      1. check if current/ directory is empty. If not, throw an InconsistentFSStateException in Storage#analyzeStorage instead of asumming its not formatted. Or,
      2. In Storage#clearDirectory, before it formats the storage directory, rename or move current/ directory. Also, log whatever is being renamed/moved.

      Attachments

        1. HDFS-10360.001.patch
          3 kB
          Wei-Chiu Chuang
        2. HDFS-10360.002.patch
          16 kB
          Wei-Chiu Chuang
        3. HDFS-10360.003.patch
          4 kB
          Wei-Chiu Chuang
        4. HDFS-10360.004.patch
          7 kB
          Wei-Chiu Chuang
        5. HDFS-10360.004.patch
          7 kB
          Wei-Chiu Chuang
        6. HDFS-10360.005.patch
          8 kB
          Wei-Chiu Chuang
        7. HDFS-10360.007.patch
          10 kB
          Wei-Chiu Chuang

        Issue Links

          Activity

            People

              weichiu Wei-Chiu Chuang
              weichiu Wei-Chiu Chuang
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: