Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-3256

HDFS considers blocks under-replicated if topology script is configured with only 1 rack

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0-alpha
    • Fix Version/s: 2.0.0-alpha, 0.23.7
    • Component/s: None
    • Labels:
      None

      Description

      HDFS treats the mere presence of a topology script being configured as evidence that there are multiple racks. If there is in fact only a single rack, the NN will try to place the blocks on at least two racks, and thus blocks will be considered to be under-replicated.

      1. HDFS-3256.patch
        8 kB
        Aaron T. Myers
      2. HDFS-3256.patch
        8 kB
        Aaron T. Myers
      3. HDFS-3256.patch
        8 kB
        Aaron T. Myers
      4. HDFS-3256.patch
        8 kB
        Aaron T. Myers
      5. HDFS-3256.patch
        8 kB
        Aaron T. Myers
      6. HDFS-3256.patch
        3 kB
        Aaron T. Myers

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-0.23-Build #553 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/553/)
          svn merge -c 1325531 Merging from trunk to branch-0.23 to fix HDFS-3256. (Revision 1455991)

          Result = SUCCESS
          kihwal : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1455991
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlocksWithNotEnoughRacks.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #553 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/553/ ) svn merge -c 1325531 Merging from trunk to branch-0.23 to fix HDFS-3256 . (Revision 1455991) Result = SUCCESS kihwal : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1455991 Files : /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlocksWithNotEnoughRacks.java
          Hide
          Kihwal Lee added a comment -

          Merged to branch-0.23.

          Show
          Kihwal Lee added a comment - Merged to branch-0.23.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1048 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1048/)
          HDFS-3256. HDFS considers blocks under-replicated if topology script is configured with only 1 rack. Contributed by Aaron T. Myers. (Revision 1325531)

          Result = SUCCESS
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1325531
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlocksWithNotEnoughRacks.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1048 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1048/ ) HDFS-3256 . HDFS considers blocks under-replicated if topology script is configured with only 1 rack. Contributed by Aaron T. Myers. (Revision 1325531) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1325531 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlocksWithNotEnoughRacks.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1013 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1013/)
          HDFS-3256. HDFS considers blocks under-replicated if topology script is configured with only 1 rack. Contributed by Aaron T. Myers. (Revision 1325531)

          Result = FAILURE
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1325531
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlocksWithNotEnoughRacks.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1013 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1013/ ) HDFS-3256 . HDFS considers blocks under-replicated if topology script is configured with only 1 rack. Contributed by Aaron T. Myers. (Revision 1325531) Result = FAILURE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1325531 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlocksWithNotEnoughRacks.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #2081 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2081/)
          HDFS-3256. HDFS considers blocks under-replicated if topology script is configured with only 1 rack. Contributed by Aaron T. Myers. (Revision 1325531)

          Result = ABORTED
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1325531
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlocksWithNotEnoughRacks.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2081 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2081/ ) HDFS-3256 . HDFS considers blocks under-replicated if topology script is configured with only 1 rack. Contributed by Aaron T. Myers. (Revision 1325531) Result = ABORTED atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1325531 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlocksWithNotEnoughRacks.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #2067 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2067/)
          HDFS-3256. HDFS considers blocks under-replicated if topology script is configured with only 1 rack. Contributed by Aaron T. Myers. (Revision 1325531)

          Result = SUCCESS
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1325531
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlocksWithNotEnoughRacks.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2067 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2067/ ) HDFS-3256 . HDFS considers blocks under-replicated if topology script is configured with only 1 rack. Contributed by Aaron T. Myers. (Revision 1325531) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1325531 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlocksWithNotEnoughRacks.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #2140 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2140/)
          HDFS-3256. HDFS considers blocks under-replicated if topology script is configured with only 1 rack. Contributed by Aaron T. Myers. (Revision 1325531)

          Result = SUCCESS
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1325531
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlocksWithNotEnoughRacks.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2140 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2140/ ) HDFS-3256 . HDFS considers blocks under-replicated if topology script is configured with only 1 rack. Contributed by Aaron T. Myers. (Revision 1325531) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1325531 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlocksWithNotEnoughRacks.java
          Hide
          Aaron T. Myers added a comment -

          Thanks a lot for the reviews, Todd and Tom. I've just committed this to trunk and branch-2.

          Show
          Aaron T. Myers added a comment - Thanks a lot for the reviews, Todd and Tom. I've just committed this to trunk and branch-2.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12522462/HDFS-3256.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2266//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2266//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12522462/HDFS-3256.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified test files. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2266//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2266//console This message is automatically generated.
          Hide
          Todd Lipcon added a comment -

          +1

          Show
          Todd Lipcon added a comment - +1
          Hide
          Aaron T. Myers added a comment -

          In investigating the TestBlocksWithNotEnoughRacks failure, I came across a bug in one of the test util methods, filed here: HDFS-3266 and Todd found another issue with TestBlocksWithNotEnoughRacks specifically, filed here: HDFS-3267

          I've been looping TestBlocksWithNotEnoughRacks for an hour now, with no failures.

          I'm pretty confident that the TestLeaseRecovery2 failure is this JIRA: HDFS-3181

          I've also been looping TestLeaseRecovery2 now for a few minutes and haven't had any failures.

          Show
          Aaron T. Myers added a comment - In investigating the TestBlocksWithNotEnoughRacks failure, I came across a bug in one of the test util methods, filed here: HDFS-3266 and Todd found another issue with TestBlocksWithNotEnoughRacks specifically, filed here: HDFS-3267 I've been looping TestBlocksWithNotEnoughRacks for an hour now, with no failures. I'm pretty confident that the TestLeaseRecovery2 failure is this JIRA: HDFS-3181 I've also been looping TestLeaseRecovery2 now for a few minutes and haven't had any failures.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12522462/HDFS-3256.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hdfs.TestLeaseRecovery2

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2264//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2264//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12522462/HDFS-3256.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified test files. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.TestLeaseRecovery2 +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2264//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2264//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12522445/HDFS-3256.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2261//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2261//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12522445/HDFS-3256.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified test files. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2261//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2261//console This message is automatically generated.
          Hide
          Aaron T. Myers added a comment -

          Too many "processing"s. You've been hanging out with Eli too much.

          Chicago Manual of Style says repeating words unnecessarily is bad form? Who knew?

          Also, I'd say we should just log it at DEBUG level for the "Not checking" case.

          Good thinking. Attached patch makes this change, as well as removing a "processing" from the log message.

          Show
          Aaron T. Myers added a comment - Too many "processing"s. You've been hanging out with Eli too much. Chicago Manual of Style says repeating words unnecessarily is bad form? Who knew? Also, I'd say we should just log it at DEBUG level for the "Not checking" case. Good thinking. Attached patch makes this change, as well as removing a "processing" from the log message.
          Hide
          Todd Lipcon added a comment -

          not yet processing processing repl queues

          Too many "processing"s. You've been hanging out with Eli too much.

          Also, I'd say we should just log it at DEBUG level for the "Not checking" case.

          Show
          Todd Lipcon added a comment - not yet processing processing repl queues Too many "processing"s. You've been hanging out with Eli too much. Also, I'd say we should just log it at DEBUG level for the "Not checking" case.
          Hide
          Aaron T. Myers added a comment -

          Thanks a lot for the review, Todd. Here's an updated patch which adds the requested log message. The relevant difference since the last patch is just:

          +      String message = "DN " + node + " joining cluster has expanded a formerly " +
          +          "single-rack cluster to be multi-rack. ";
          +      if (namesystem.isPopulatingReplQueues()) {
          +        message += "Re-checking all blocks for replication, since they should " +
          +            "now be replicated cross-rack";
          +      } else {
          +        message += "Not checking for mis-replicated blocks because this NN is " +
          +            "not yet processing processing repl queues.";
          +      }
          +      LOG.info(message);
          
          Show
          Aaron T. Myers added a comment - Thanks a lot for the review, Todd. Here's an updated patch which adds the requested log message. The relevant difference since the last patch is just: + String message = "DN " + node + " joining cluster has expanded a formerly " + + "single-rack cluster to be multi-rack. " ; + if (namesystem.isPopulatingReplQueues()) { + message += "Re-checking all blocks for replication, since they should " + + "now be replicated cross-rack" ; + } else { + message += "Not checking for mis-replicated blocks because this NN is " + + "not yet processing processing repl queues." ; + } + LOG.info(message);
          Hide
          Todd Lipcon added a comment -

          patch looks good. One minor comment: can you please add an INFO log saying something like: Datanode <blah blah> joining cluster has expanded a formerly single-rack cluster to multi-rack. Re-checking all blocks for replication, since they should now be replicated cross-rack"

          Show
          Todd Lipcon added a comment - patch looks good. One minor comment: can you please add an INFO log saying something like: Datanode <blah blah> joining cluster has expanded a formerly single-rack cluster to multi-rack. Re-checking all blocks for replication, since they should now be replicated cross-rack"
          Hide
          Aaron T. Myers added a comment -

          Great point, Todd. Looks like that check wasn't quite as conservative as I'd thought.

          Here's an updated patch which only calls BlockManager#processMisReplicatedBlocks if we're becoming a multi-rack cluster AND we're already processing repl queues. This makes sense, since if we're not already processing repl queues then we're guaranteed that processMisReplicatedBlocks will be called later when we do start processing repl queues, triggered by leaving safemode, becoming active, etc.

          Show
          Aaron T. Myers added a comment - Great point, Todd. Looks like that check wasn't quite as conservative as I'd thought. Here's an updated patch which only calls BlockManager#processMisReplicatedBlocks if we're becoming a multi-rack cluster AND we're already processing repl queues. This makes sense, since if we're not already processing repl queues then we're guaranteed that processMisReplicatedBlocks will be called later when we do start processing repl queues, triggered by leaving safemode, becoming active, etc.
          Hide
          Tom White added a comment -

          Thanks for clarifying, Aaron. I don't think AbstractDNSToSwitchMapping.isSingleSwitch is used (yet).

          Show
          Tom White added a comment - Thanks for clarifying, Aaron. I don't think AbstractDNSToSwitchMapping.isSingleSwitch is used (yet).
          Hide
          Todd Lipcon added a comment -

          Is this check actually correct? I think you need to check whether the repl queues are initialized, explicitly. For example, you could enter manual safe mode, in which case the repl queues are still being tracked, and it's incorrect to not call processMisReplicatedBlocks.

          Show
          Todd Lipcon added a comment - Is this check actually correct? I think you need to check whether the repl queues are initialized, explicitly. For example, you could enter manual safe mode, in which case the repl queues are still being tracked, and it's incorrect to not call processMisReplicatedBlocks.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12522394/HDFS-3256.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2259//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2259//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12522394/HDFS-3256.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified test files. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2259//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2259//console This message is automatically generated.
          Hide
          Aaron T. Myers added a comment -

          Great catch, Todd.

          Here's an updated patch which will only call processMisReplicatedBlocks when becoming multi-rack if the NN is not in safemode and not in standby state. This behavior is a tad more conservative than the check that's done in the SafeMode class, but that seems fine. If the cluster becomes multi-rack while in safmode or standby state, we're guaranteed that processMisReplicatedBlocks will be called when the NN exits safemode or becomes active.

          Show
          Aaron T. Myers added a comment - Great catch, Todd. Here's an updated patch which will only call processMisReplicatedBlocks when becoming multi-rack if the NN is not in safemode and not in standby state. This behavior is a tad more conservative than the check that's done in the SafeMode class, but that seems fine. If the cluster becomes multi-rack while in safmode or standby state, we're guaranteed that processMisReplicatedBlocks will be called when the NN exits safemode or becomes active.
          Hide
          Todd Lipcon added a comment -

          I think there's an issue here with safemode's delayed initialization of repl queues. As the DNs are checking in, when the cluster transitions from single-rack to multi-rack, it will call processMisReplicatedBlocks, even if the threshold (DFS_NAMENODE_REPL_QUEUE_THRESHOLD_PCT_KEY) hasn't been crossed. I think you need to somehow tie this back to the flag in safemode which determines whether misreplicated blocks have been processed yet.

          Show
          Todd Lipcon added a comment - I think there's an issue here with safemode's delayed initialization of repl queues. As the DNs are checking in, when the cluster transitions from single-rack to multi-rack, it will call processMisReplicatedBlocks, even if the threshold (DFS_NAMENODE_REPL_QUEUE_THRESHOLD_PCT_KEY) hasn't been crossed. I think you need to somehow tie this back to the flag in safemode which determines whether misreplicated blocks have been processed yet.
          Hide
          Todd Lipcon added a comment -

          Linking to HDFS-15 - this is a regression caused by that JIRA, I'm pretty sure.

          Show
          Todd Lipcon added a comment - Linking to HDFS-15 - this is a regression caused by that JIRA, I'm pretty sure.
          Hide
          Aaron T. Myers added a comment -

          How does this relate to AbstractDNSToSwitchMapping.isSingleSwitch that Steve added recently?

          I think the two are unrelated. This change only determines rack cardinality using the NetworkTopology class, which itself is fed by the configured DNSToSwitchMapping. This is the same methodology that the BlockManager has thus far been using to determine if replicas for a given block exist on more than one rack or not.

          Show
          Aaron T. Myers added a comment - How does this relate to AbstractDNSToSwitchMapping.isSingleSwitch that Steve added recently? I think the two are unrelated. This change only determines rack cardinality using the NetworkTopology class, which itself is fed by the configured DNSToSwitchMapping. This is the same methodology that the BlockManager has thus far been using to determine if replicas for a given block exist on more than one rack or not.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12522367/HDFS-3256.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2256//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2256//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12522367/HDFS-3256.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified test files. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2256//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2256//console This message is automatically generated.
          Hide
          Tom White added a comment -

          How does this relate to AbstractDNSToSwitchMapping.isSingleSwitch that Steve added recently?

          Show
          Tom White added a comment - How does this relate to AbstractDNSToSwitchMapping.isSingleSwitch that Steve added recently?
          Hide
          Aaron T. Myers added a comment -

          Aaron, could you please describe the solution along with the patch.

          Sure. In the original patch, the fix I had was just to check if only a single rack was configured and not count a block as being on too-few racks if there in fact was only one rack.

          However, here's an updated patch which should address the test failures.

          There were two reasons for the test failures:

          1. There were 2 tests which asserted that blocks were under-replicated when only one rack was configured, which is no longer the case.
          2. There was 1 test which had blocks for a single file spread across two racks, and asserted that after all the nodes on one of the racks died, that blocks were then considered misreplicated.

          Fixing #1 above just consisted of changing the tests to not assert that behavior.

          Fixing #2 was a little more involved. I added a boolean to DatanodeManager to track whether or not the cluster had ever consisted of multiple racks. This is updated whenever a node is added to the cluster. When a cluster goes from being sinlge-rack to multi-rack, we call BlockManager#processMisReplicatedBlocks to make sure that we look for blocks which were not considered under-replicated when there was only a single rack, that should now be on separate racks.

          Then, the check for whether or not a block is on enough racks becomes "is the desired replication only 1, OR has there only ever been one rack configured in this cluster."

          Show
          Aaron T. Myers added a comment - Aaron, could you please describe the solution along with the patch. Sure. In the original patch, the fix I had was just to check if only a single rack was configured and not count a block as being on too-few racks if there in fact was only one rack. However, here's an updated patch which should address the test failures. There were two reasons for the test failures: There were 2 tests which asserted that blocks were under-replicated when only one rack was configured, which is no longer the case. There was 1 test which had blocks for a single file spread across two racks, and asserted that after all the nodes on one of the racks died, that blocks were then considered misreplicated. Fixing #1 above just consisted of changing the tests to not assert that behavior. Fixing #2 was a little more involved. I added a boolean to DatanodeManager to track whether or not the cluster had ever consisted of multiple racks. This is updated whenever a node is added to the cluster. When a cluster goes from being sinlge-rack to multi-rack, we call BlockManager#processMisReplicatedBlocks to make sure that we look for blocks which were not considered under-replicated when there was only a single rack, that should now be on separate racks. Then, the check for whether or not a block is on enough racks becomes "is the desired replication only 1, OR has there only ever been one rack configured in this cluster."
          Hide
          Suresh Srinivas added a comment -

          Aaron, could you please describe the solution along with the patch. This helps folks who do not look at the code understand the solution and the behavior of HDFS.

          Show
          Suresh Srinivas added a comment - Aaron, could you please describe the solution along with the patch. This helps folks who do not look at the code understand the solution and the behavior of HDFS.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12522322/HDFS-3256.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2252//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2252//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12522322/HDFS-3256.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2252//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2252//console This message is automatically generated.
          Hide
          Todd Lipcon added a comment -

          Looks good. +1 pending jenkins

          Show
          Todd Lipcon added a comment - Looks good. +1 pending jenkins
          Hide
          Aaron T. Myers added a comment -

          Here's a patch for trunk which should address the issue.

          Show
          Aaron T. Myers added a comment - Here's a patch for trunk which should address the issue.

            People

            • Assignee:
              Aaron T. Myers
              Reporter:
              Aaron T. Myers
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development