Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-900

Corrupt replicas are not tracked correctly through block report from DN

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.22.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      This one is tough to describe, but essentially the following order of events is seen to occur:

      1. A client marks one replica of a block to be corrupt by telling the NN about it
      2. Replication is then scheduled to make a new replica of this node
      3. The replication completes, such that there are now 3 good replicas and 1 corrupt replica
      4. The DN holding the corrupt replica sends a block report. Rather than telling this DN to delete the node, the NN instead marks this as a new good replica of the block, and schedules deletion on one of the good replicas.

      I don't know if this is a dataloss bug in the case of 1 corrupt replica with dfs.replication=2, but it seems feasible. I will attach a debug log with some commentary marked by '============>', plus a unit test patch which I can get to reproduce this behavior reliably. (it's not a proper unit test, just some edits to an existing one to show it)

      1. reportCorruptBlock.patch
        2 kB
        Konstantin Shvachko
      2. to-reproduce.patch
        6 kB
        Todd Lipcon
      3. log-commented
        34 kB
        Todd Lipcon

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-22-branch #35 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-22-branch/35/)

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-22-branch #35 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-22-branch/35/ )
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #643 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/643/)

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #643 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/643/ )
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #539 (See https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/)

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #539 (See https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/ )
          Hide
          Todd Lipcon added a comment -

          I'm a little concerned that this wasn't committed with a test. The fix looks good but manual testing won't prevent a regression

          Show
          Todd Lipcon added a comment - I'm a little concerned that this wasn't committed with a test. The fix looks good but manual testing won't prevent a regression
          Hide
          Konstantin Shvachko added a comment -

          I just committed this.

          Show
          Konstantin Shvachko added a comment - I just committed this.
          Hide
          Konstantin Shvachko added a comment -

          test failures:
          TestFileConcurrentReader - HDFS-1401
          TestStorageRestore - HDFS-1496

          test-patch results:

               [exec] -1 overall.  
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
               [exec]                         Please justify why no new tests are needed for this patch.
               [exec]                         Also please list what manual steps were performed to verify this patch.
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.
               [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
               [exec]     +1 system test framework.  The patch passed system test framework compile.
               [exec] ======================================================================
          

          Testing of this patch have dome manually and using Todd's utility attached above.

          Show
          Konstantin Shvachko added a comment - test failures: TestFileConcurrentReader - HDFS-1401 TestStorageRestore - HDFS-1496 test-patch results: [exec] -1 overall. [exec] +1 @author. The patch does not contain any @author tags. [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] +1 system test framework. The patch passed system test framework compile. [exec] ====================================================================== Testing of this patch have dome manually and using Todd's utility attached above.
          Hide
          Jakob Homan added a comment -

          +1

          Show
          Jakob Homan added a comment - +1
          Hide
          Konstantin Shvachko added a comment -

          Yes, this is indeed a bug in block report. After step 3 in Todd's description the NN has 3 good replicas and one corrupt. The corrupt replica is in recentInvalidatesSet, but not in the DatanodeDescriptor. That is the replica is scheduled for deletion from the DN. See blockReceived().
          But before it is deleted from the DN, that same DN sends a block report, which contains the replica. DatanodeDescriptor.processReport() treats it as a new replica because it is not in the DatanodeDescriptor and a good one since its blockId, generationStamp, and the length are in order.
          The fix is to ignore replicas that are scheduled for deletion from this DN.
          I tested this patch with the test case attached by Todd, thanks. The test passes with the fix and fails without.
          The test case is not exactly a unit test as it introduces changes to FSNamesystem class for testing. So I did not include it to the patch.
          Todd, is it possible to convert your case into a real unit test.

          Show
          Konstantin Shvachko added a comment - Yes, this is indeed a bug in block report. After step 3 in Todd's description the NN has 3 good replicas and one corrupt. The corrupt replica is in recentInvalidatesSet, but not in the DatanodeDescriptor. That is the replica is scheduled for deletion from the DN. See blockReceived(). But before it is deleted from the DN, that same DN sends a block report, which contains the replica. DatanodeDescriptor.processReport() treats it as a new replica because it is not in the DatanodeDescriptor and a good one since its blockId, generationStamp, and the length are in order. The fix is to ignore replicas that are scheduled for deletion from this DN. I tested this patch with the test case attached by Todd, thanks. The test passes with the fix and fails without. The test case is not exactly a unit test as it introduces changes to FSNamesystem class for testing. So I did not include it to the patch. Todd, is it possible to convert your case into a real unit test.
          Hide
          Todd Lipcon added a comment -

          In discussion with Nigel, we'd like to mark this as blocker pending further investigation. If we determine it's not a regression since 0.20 we'll downgrade priority.

          Show
          Todd Lipcon added a comment - In discussion with Nigel, we'd like to mark this as blocker pending further investigation. If we determine it's not a regression since 0.20 we'll downgrade priority.
          Hide
          Todd Lipcon added a comment -

          Yes, I believe this reproduces in 0.21 and 0.22 (the to-reproduce.patch was against trunk at the time though likely is out of date now)

          Show
          Todd Lipcon added a comment - Yes, I believe this reproduces in 0.21 and 0.22 (the to-reproduce.patch was against trunk at the time though likely is out of date now)
          Hide
          Thanh Do added a comment -

          Todd, do you see this on 0.21.0?
          Is this a bug of handling corrupt replicas at NN?

          Show
          Thanh Do added a comment - Todd, do you see this on 0.21.0? Is this a bug of handling corrupt replicas at NN?

            People

            • Assignee:
              Konstantin Shvachko
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development