Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-5590

Block ID and generation stamp may be reused when persistBlocks is set to false

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.2.0
    • Fix Version/s: 2.3.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      In a cluster with non-HA setup and dfs.persist.blocks set to false, we may have data loss in the following case:

      1. client creates file1 and requests a block from NN and get blk_id1_gs1
      2. client writes blk_id1_gs1 to DN
      3. NN is restarted and because persistBlocks is false, blk_id1_gs1 may not be persisted in disk
      4. another client creates file2 and NN will allocate a new block using the same block id blk_id1_gs1 since block ID and generation stamp are both increased sequentially.

      Now we may have two versions (file1 and file2) of the blk_id1_gs1 (same id, same gs) in the system. It will case data loss.

        Attachments

        1. HDFS-5590.001.patch
          7 kB
          Jing Zhao
        2. HDFS-5590.000.patch
          1 kB
          Jing Zhao

          Activity

            People

            • Assignee:
              jingzhao Jing Zhao
              Reporter:
              jingzhao Jing Zhao
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: