Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-5590

Block ID and generation stamp may be reused when persistBlocks is set to false

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.2.0
    • 2.3.0
    • None
    • None
    • Reviewed

    Description

      In a cluster with non-HA setup and dfs.persist.blocks set to false, we may have data loss in the following case:

      1. client creates file1 and requests a block from NN and get blk_id1_gs1
      2. client writes blk_id1_gs1 to DN
      3. NN is restarted and because persistBlocks is false, blk_id1_gs1 may not be persisted in disk
      4. another client creates file2 and NN will allocate a new block using the same block id blk_id1_gs1 since block ID and generation stamp are both increased sequentially.

      Now we may have two versions (file1 and file2) of the blk_id1_gs1 (same id, same gs) in the system. It will case data loss.

      Attachments

        1. HDFS-5590.000.patch
          1 kB
          Jing Zhao
        2. HDFS-5590.001.patch
          7 kB
          Jing Zhao

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            jingzhao Jing Zhao
            jingzhao Jing Zhao
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Issue deployment