Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-1228

CRC does not match when retrying appending a partial block

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 0.20-append
    • None
    • datanode
    • None

    Description

      • Summary: when appending to partial block, if is possible that
        retrial when facing an exception fails due to a checksum mismatch.
        Append operation is not atomic (either complete or fail completely).
      • Setup:
        + # available datanodes = 2
        +# disks / datanode = 1
        + # failures = 1
        + failure type = bad disk
        + When/where failure happens = (see below)
      • Details:
        Client writes 16 bytes to dn1 and dn2. Write completes. So far so good.
        The meta file now contains: 7 bytes header + 4 byte checksum (CK1 -
        checksum for 16 byte) Client then appends 16 bytes more, and let assume there is an
        exception at BlockReceiver.receivePacket() at dn2. So the client knows dn2
        is bad. BUT, the append at dn1 is complete (i.e the data portion and checksum portion
        has been made to disk to the corresponding block file and meta file), meaning that the
        checksum file at dn1 now contains 7 bytes header + 4 byte checksum (CK2 - this is
        checksum for 32 byte data). Because dn2 has an exception, client calls recoverBlock and
        starts append again to dn1. dn1 receives 16 byte data, it verifies if the pre-computed
        crc (CK2) matches what we recalculate just now (CK1), which obviously does not match.
        Hence an exception and retrial fails.

      This bug was found by our Failure Testing Service framework:
      http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
      For questions, please email us: Thanh Do (thanhdo@cs.wisc.edu) and
      Haryadi Gunawi (haryadi@eecs.berkeley.edu)

      Attachments

        Activity

          People

            Unassigned Unassigned
            thanhdo Thanh Do
            Votes:
            2 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: