Hadoop Common
  1. Hadoop Common
  2. HADOOP-4910

NameNode should exclude corrupt replicas when choosing excessive replicas to delete

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.17.0
    • Fix Version/s: 0.18.3
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Currently, when NameNode handles an over-replicated block in FSNamesystem#processOverReplicatedBlock, it excludes ones already in excessReplicateMap and decommissed ones, but it treats a corrupt replica as a valid one. This may lead to unnecessary deletion of more replicas and thus cause data lose. It should exclude corrupt replicas as well.

      1. overReplicated.patch
        1 kB
        Hairong Kuang
      2. overReplicated1.patch
        6 kB
        Hairong Kuang
      3. overReplicated2-br18.patch
        6 kB
        Hairong Kuang
      4. overReplicated2.patch
        6 kB
        Hairong Kuang

        Activity

        Hide
        Zheng Shao added a comment -

        Is this only in 0.17.0, or later versions as well?

        Show
        Zheng Shao added a comment - Is this only in 0.17.0, or later versions as well?
        Hide
        Hairong Kuang added a comment -

        I am still working on a junit testcase, but attaching the fix first.

        Yes, this bug affects 0.17 as well as later releases.

        Show
        Hairong Kuang added a comment - I am still working on a junit testcase, but attaching the fix first. Yes, this bug affects 0.17 as well as later releases.
        Hide
        Hairong Kuang added a comment -

        This patch has a unit test. Without the fix, the unit test will fail, showing that the block gets lost.

        Show
        Hairong Kuang added a comment - This patch has a unit test. Without the fix, the unit test will fail, showing that the block gets lost.
        Hide
        Raghu Angadi added a comment -

        +1 patch looks good. Simple but important fix.
        minor nit : I would suggest removing reference to this jira in comments. Otherwise it gives an impression that reader should read this jira to understand what is happening, which is not necessary.

        Show
        Raghu Angadi added a comment - +1 patch looks good. Simple but important fix. minor nit : I would suggest removing reference to this jira in comments. Otherwise it gives an impression that reader should read this jira to understand what is happening, which is not necessary.
        Hide
        Hairong Kuang added a comment -

        This is a patch for 0.18. Incorporated Raghu's comment.

        Show
        Hairong Kuang added a comment - This is a patch for 0.18. Incorporated Raghu's comment.
        Hide
        Hairong Kuang added a comment -

        Patch for the trunk.

        Show
        Hairong Kuang added a comment - Patch for the trunk.
        Hide
        Hairong Kuang added a comment -

        Ant test-patch passed
        [exec] +1 overall.
        [exec]
        [exec] +1 @author. The patch does not contain any @author tags.
        [exec]
        [exec] +1 tests included. The patch appears to include 9 new or modified tests.
        [exec]
        [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
        [exec]
        [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
        [exec]
        [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
        [exec]
        [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
        [exec]
        Ant test-core failed with a known junit test failure reported at HADOOP-4907. All other unit tests passed.

        Show
        Hairong Kuang added a comment - Ant test-patch passed [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 9 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] Ant test-core failed with a known junit test failure reported at HADOOP-4907 . All other unit tests passed.
        Hide
        Hairong Kuang added a comment -

        I've just committed this.

        Show
        Hairong Kuang added a comment - I've just committed this.

          People

          • Assignee:
            Hairong Kuang
            Reporter:
            Hairong Kuang
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development