Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-4898

BlockPlacementPolicyWithNodeGroup.chooseRemoteRack() fails to properly fallback to local rack

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.2.0, 2.0.4-alpha
    • Fix Version/s: 1.3.0, 2.1.1-beta
    • Component/s: namenode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      As currently implemented, BlockPlacementPolicyWithNodeGroup does not properly fallback to local rack when no nodes are available in remote racks, resulting in an improper NotEnoughReplicasException.

      BlockPlacementPolicyWithNodeGroup.java
        @Override
        protected void chooseRemoteRack(int numOfReplicas,
            DatanodeDescriptor localMachine, HashMap<Node, Node> excludedNodes,
            long blocksize, int maxReplicasPerRack, List<DatanodeDescriptor> results,
            boolean avoidStaleNodes) throws NotEnoughReplicasException {
          int oldNumOfReplicas = results.size();
          // randomly choose one node from remote racks
          try {
            chooseRandom(
                numOfReplicas,
                "~" + NetworkTopology.getFirstHalf(localMachine.getNetworkLocation()),
                excludedNodes, blocksize, maxReplicasPerRack, results,
                avoidStaleNodes);
          } catch (NotEnoughReplicasException e) {
            chooseRandom(numOfReplicas - (results.size() - oldNumOfReplicas),
                localMachine.getNetworkLocation(), excludedNodes, blocksize,
                maxReplicasPerRack, results, avoidStaleNodes);
          }
        }
      

      As currently coded the chooseRandom() call in the catch block will never succeed as the set of nodes within the passed in node path (e.g. /rack1/nodegroup1) is entirely contained within the set of excluded nodes (both are the set of nodes within the same nodegroup as the node chosen first replica).

      The bug is that the fallback chooseRandom() call in the catch block should be passing in the complement of the node path used in the initial chooseRandom() call in the try block (e.g. /rack1) - namely:

      NetworkTopology.getFirstHalf(localMachine.getNetworkLocation())
      

      This will yield the proper fallback behavior of choosing a random node from within the same rack, but still excluding those nodes in the same nodegroup

      1. h4898_20130809_b-1.patch
        1 kB
        Tsz Wo Nicholas Sze
      2. h4898_20130809.patch
        2 kB
        Tsz Wo Nicholas Sze

        Issue Links

          Activity

          Hide
          Eric Sirianni added a comment -
          index 8cb072b..302981f 100644
          --- a/src/hdfs/org/apache/hadoop/hdfs/server/namenode/BlockPlacementPolicyWithNodeGroup.java
          +++ b/src/hdfs/org/apache/hadoop/hdfs/server/namenode/BlockPlacementPolicyWithNodeGroup.java
          @@ -178,7 +178,7 @@ public class BlockPlacementPolicyWithNodeGroup extends BlockPlacementPolicyDefau
                     avoidStaleNodes);
               } catch (NotEnoughReplicasException e) {
                 chooseRandom(numOfReplicas - (results.size() - oldNumOfReplicas),
          -          localMachine.getNetworkLocation(), excludedNodes, blocksize,
          +          NetworkTopology.getFirstHalf(localMachine.getNetworkLocation()), excludedNodes, blocksize,
                     maxReplicasPerRack, results, avoidStaleNodes);
               }
             }
          
          Show
          Eric Sirianni added a comment - index 8cb072b..302981f 100644 --- a/src/hdfs/org/apache/hadoop/hdfs/server/namenode/BlockPlacementPolicyWithNodeGroup.java +++ b/src/hdfs/org/apache/hadoop/hdfs/server/namenode/BlockPlacementPolicyWithNodeGroup.java @@ -178,7 +178,7 @@ public class BlockPlacementPolicyWithNodeGroup extends BlockPlacementPolicyDefau avoidStaleNodes); } catch (NotEnoughReplicasException e) { chooseRandom(numOfReplicas - (results.size() - oldNumOfReplicas), - localMachine.getNetworkLocation(), excludedNodes, blocksize, + NetworkTopology.getFirstHalf(localMachine.getNetworkLocation()), excludedNodes, blocksize, maxReplicasPerRack, results, avoidStaleNodes); } }
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Eric, good catch. Let's use a local variable to store the network location, i.e.

          final String networkLocation = NetworkTopology.getFirstHalf(localMachine.getNetworkLocation());
          

          Could you post a patch? Thanks.

          Show
          Tsz Wo Nicholas Sze added a comment - Eric, good catch. Let's use a local variable to store the network location, i.e. final String networkLocation = NetworkTopology.getFirstHalf(localMachine.getNetworkLocation()); Could you post a patch? Thanks.
          Hide
          Eric Sirianni added a comment -

          Thanks Nicholas - yes, I will update the patch based on your suggestion.

          Show
          Eric Sirianni added a comment - Thanks Nicholas - yes, I will update the patch based on your suggestion.
          Hide
          Eric Sirianni added a comment -

          Patch to BlockPlacementPolicyWithNodeGroup

          Show
          Eric Sirianni added a comment - Patch to BlockPlacementPolicyWithNodeGroup
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12596627/HDFS-4898.patch
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4777//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12596627/HDFS-4898.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4777//console This message is automatically generated.
          Hide
          Eric Sirianni added a comment -

          Sorry - let me regenerate the patch against trunk and use the proper "--no-prefix" flag to diff.

          Show
          Eric Sirianni added a comment - Sorry - let me regenerate the patch against trunk and use the proper "--no-prefix" flag to diff.
          Hide
          Eric Sirianni added a comment -

          Patch generated against trunk with --no-prefix option.

          Show
          Eric Sirianni added a comment - Patch generated against trunk with --no-prefix option.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12596663/HDFS-4898.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.TestCrcCorruption

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4779//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4779//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12596663/HDFS-4898.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestCrcCorruption +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4779//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4779//console This message is automatically generated.
          Hide
          Eric Sirianni added a comment -

          Nicholas - as discussed offline, from a legal perspective, I'm not yet able to contribute patches. I hope to get this worked out soon with my employer, but for now, I'm reassigning the JIRA to you. Thanks.

          Show
          Eric Sirianni added a comment - Nicholas - as discussed offline, from a legal perspective, I'm not yet able to contribute patches. I hope to get this worked out soon with my employer, but for now, I'm reassigning the JIRA to you. Thanks.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          h4898_20130809.patch: follows Eric's idea.

          Show
          Tsz Wo Nicholas Sze added a comment - h4898_20130809.patch: follows Eric's idea.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12597086/h4898_20130809.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4799//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4799//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12597086/h4898_20130809.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4799//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4799//console This message is automatically generated.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          The failure of TestBlocksWithNotEnoughRacks is not related. It does not use BlockPlacementPolicyWithNodeGroup at all.

          Show
          Tsz Wo Nicholas Sze added a comment - The failure of TestBlocksWithNotEnoughRacks is not related. It does not use BlockPlacementPolicyWithNodeGroup at all.
          Hide
          Suresh Srinivas added a comment -

          +1 for the patch. We should add a unit test for this.

          Show
          Suresh Srinivas added a comment - +1 for the patch. We should add a unit test for this.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Thanks Suresh for reviewing the patch and thanks Eric for the idea.

          I have committed this.

          Show
          Tsz Wo Nicholas Sze added a comment - Thanks Suresh for reviewing the patch and thanks Eric for the idea. I have committed this.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          h4898_20130809_b-1.patch: for branch-1.

          Show
          Tsz Wo Nicholas Sze added a comment - h4898_20130809_b-1.patch: for branch-1.
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-trunk-Commit #4265 (See https://builds.apache.org/job/Hadoop-trunk-Commit/4265/)
          HDFS-4898. BlockPlacementPolicyWithNodeGroup.chooseRemoteRack() fails to properly fallback to local rack. (szetszwo: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1514156)

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #4265 (See https://builds.apache.org/job/Hadoop-trunk-Commit/4265/ ) HDFS-4898 . BlockPlacementPolicyWithNodeGroup.chooseRemoteRack() fails to properly fallback to local rack. (szetszwo: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1514156 ) /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Also committed the branch-1 patch.

          Show
          Tsz Wo Nicholas Sze added a comment - Also committed the branch-1 patch.
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk #302 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/302/)
          HDFS-4898. BlockPlacementPolicyWithNodeGroup.chooseRemoteRack() fails to properly fallback to local rack. (szetszwo: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1514156)

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #302 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/302/ ) HDFS-4898 . BlockPlacementPolicyWithNodeGroup.chooseRemoteRack() fails to properly fallback to local rack. (szetszwo: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1514156 ) /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #1492 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1492/)
          HDFS-4898. BlockPlacementPolicyWithNodeGroup.chooseRemoteRack() fails to properly fallback to local rack. (szetszwo: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1514156)

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #1492 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1492/ ) HDFS-4898 . BlockPlacementPolicyWithNodeGroup.chooseRemoteRack() fails to properly fallback to local rack. (szetszwo: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1514156 ) /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #1519 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1519/)
          HDFS-4898. BlockPlacementPolicyWithNodeGroup.chooseRemoteRack() fails to properly fallback to local rack. (szetszwo: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1514156)

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1519 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1519/ ) HDFS-4898 . BlockPlacementPolicyWithNodeGroup.chooseRemoteRack() fails to properly fallback to local rack. (szetszwo: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1514156 ) /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java
          Hide
          Junping Du added a comment -

          Thanks for fixing it. Eric and Nicholas! Link it to HADOOP-8468.

          Show
          Junping Du added a comment - Thanks for fixing it. Eric and Nicholas! Link it to HADOOP-8468 .

            People

            • Assignee:
              Tsz Wo Nicholas Sze
              Reporter:
              Eric Sirianni
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development