Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1225

Block lost when primary crashes in recoverBlock

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.20-append
    • Fix Version/s: None
    • Component/s: datanode
    • Labels:
      None

      Description

      • Summary: Block is lost if primary datanode crashes in the middle tryUpdateBlock.
      • Setup:
      1. available datanode = 2
      2. replica = 2
      3. disks / datanode = 1
      4. failures = 1
      5. failure type = crash
        When/where failure happens = (see below)
      • Details:
        Suppose we have 2 datanodes: dn1 and dn2 and dn1 is primary.
        Client appends to blk_X_1001 and crash happens during dn1.recoverBlock,
        at the point after blk_X_1001.meta is renamed to blk_X_1001.meta_tmp1002
        *Interesting*, this case, the block X is lost eventually. Why?
        After dn1.recoverBlock crashes at rename, what left at dn1 current directory is:
        1) blk_X
        2) blk_X_1001.meta_tmp1002
        ==> this is an invalid block, because it has no meta file associated with it.
        dn2 (after dn1 crash) now contains:
        1) blk_X
        2) blk_X_1002.meta
        (note that the rename at dn2 is completed, because dn1 called dn2.updateBlock() before
        calling its own updateBlock())
        But the command namenode.commitBlockSynchronization is not reported to namenode,
        because dn1 is crashed. Therefore, from namenode point of view, the block X has GS 1001.
        Hence, the block is lost.

      This bug was found by our Failure Testing Service framework:
      http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
      For questions, please email us: Thanh Do (thanhdo@cs.wisc.edu) and
      Haryadi Gunawi (haryadi@eecs.berkeley.edu)

        Issue Links

          Activity

          Thanh Do created issue -
          Konstantin Shvachko made changes -
          Field Original Value New Value
          Affects Version/s 0.20-append [ 12315103 ]
          Affects Version/s 0.20.1 [ 12314048 ]
          Todd Lipcon made changes -
          Link This issue relates to HDFS-1263 [ HDFS-1263 ]

            People

            • Assignee:
              Unassigned
              Reporter:
              Thanh Do
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:

                Development