Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-4351

Fix BlockPlacementPolicyDefault#chooseTarget when avoiding stale nodes

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.2.0, 3.0.0
    • Fix Version/s: 1.2.0, 2.0.3-alpha
    • Component/s: namenode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Target Version/s:

      Description

      There's a bug in BlockPlacementPolicyDefault#chooseTarget with stale node avoidance enabled (HDFS-3912). If a NotEnoughReplicasException is thrown in the call to chooseRandom(), numOfReplicas is not updated together with the partial result in result since it is pass by value. The retry call to chooseTarget then uses this incorrect value.

      This can be seen if you enable stale node detection for TestReplicationPolicy#testChooseTargetWithMoreThanAvaiableNodes().

      1. hdfs-4351.patch
        4 kB
        Andrew Wang
      2. hdfs-4351-2.patch
        4 kB
        Andrew Wang
      3. hdfs-4351-3.patch
        5 kB
        Andrew Wang
      4. hdfs-4351-4.patch
        4 kB
        Andrew Wang
      5. hdfs-4351-branch-1-1.patch
        6 kB
        Andrew Wang

        Activity

        Hide
        Andrew Wang added a comment -

        Fix attached.

        Kind of a trivial test case, but it doesn't involve a new MiniDFSCluster so it's cheap. I also took the opportunity to fix the spelling mistake in the testcase name.

        Show
        Andrew Wang added a comment - Fix attached. Kind of a trivial test case, but it doesn't involve a new MiniDFSCluster so it's cheap. I also took the opportunity to fix the spelling mistake in the testcase name.
        Hide
        Andrew Wang added a comment -

        Sorry for changing the patch again so quickly, but I wanted to make the surrounding comments a bit more clear too.

        Show
        Andrew Wang added a comment - Sorry for changing the patch again so quickly, but I wanted to make the surrounding comments a bit more clear too.
        Hide
        Jing Zhao added a comment -

        Yeah, we should use an updated numOfReplicas for the chooseTarget when a NotEnoughReplicasException is thrown.
        For the current patch,

        // Required since chooseRandom() is passed numOfReplicas by value,
        // so it's not updated when returning a partial result
        numOfReplicas = oldNumOfReplicas - results.size();
        

        Because the value of the oldNumOfReplicas is set to the initial value of numOfReplicas, the above calculation may be wrong if the initial results list is not empty? Do we need to use "totalReplicasExpected - results.size()" instead?

        Besides, for the new testcase, I think maybe it's better to reset AvoidStaleDataNodesForWrite to false in the end (so that it would not affect other possible new testcases added after testChooseTargetWithMoreThanAvailableNodesWithStaleness)?

        Show
        Jing Zhao added a comment - Yeah, we should use an updated numOfReplicas for the chooseTarget when a NotEnoughReplicasException is thrown. For the current patch, // Required since chooseRandom() is passed numOfReplicas by value, // so it's not updated when returning a partial result numOfReplicas = oldNumOfReplicas - results.size(); Because the value of the oldNumOfReplicas is set to the initial value of numOfReplicas, the above calculation may be wrong if the initial results list is not empty? Do we need to use "totalReplicasExpected - results.size()" instead? Besides, for the new testcase, I think maybe it's better to reset AvoidStaleDataNodesForWrite to false in the end (so that it would not affect other possible new testcases added after testChooseTargetWithMoreThanAvailableNodesWithStaleness)?
        Hide
        Andrew Wang added a comment -

        Thanks for the review Jing! I addressed your comment on the test case in this version of the patch; good catch.

        Regarding numOfReplicas, my reading is that it's the # of additional nodes that need to be added to the results list. I inferred this since totalReplicasExpected should be constant, and is set to numOfReplicas + results.size() in chooseTarget().

        Show
        Andrew Wang added a comment - Thanks for the review Jing! I addressed your comment on the test case in this version of the patch; good catch. Regarding numOfReplicas , my reading is that it's the # of additional nodes that need to be added to the results list. I inferred this since totalReplicasExpected should be constant, and is set to numOfReplicas + results.size() in chooseTarget() .
        Hide
        Jing Zhao added a comment -

        Regarding numOfReplicas, my reading is that it's the # of additional nodes that need to be added to the results list.

        Yes. So I think the new numOfReplicas passed into the chooseTarget when a NotEnoughReplicasException is thrown should be calculated based on the change of the number of nodes in the results list. So it can be either "oldNumOfReplicas - (number of new nodes in results)" or "oldNumOfReplicas + (initial number of nodes in results) - (current number of nodes in results)".

        Show
        Jing Zhao added a comment - Regarding numOfReplicas, my reading is that it's the # of additional nodes that need to be added to the results list. Yes. So I think the new numOfReplicas passed into the chooseTarget when a NotEnoughReplicasException is thrown should be calculated based on the change of the number of nodes in the results list. So it can be either "oldNumOfReplicas - (number of new nodes in results)" or "oldNumOfReplicas + (initial number of nodes in results) - (current number of nodes in results)".
        Hide
        Andrew Wang added a comment -

        Looking at it again, you're right on totalReplicasExpected. Patch now addresses both.

        Show
        Andrew Wang added a comment - Looking at it again, you're right on totalReplicasExpected. Patch now addresses both.
        Hide
        Jing Zhao added a comment -

        Thanks Andrew! +1 pending Jenkins.

        Show
        Jing Zhao added a comment - Thanks Andrew! +1 pending Jenkins.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12562957/hdfs-4351.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3711//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3711//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12562957/hdfs-4351.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3711//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3711//console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12562986/hdfs-4351-4.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3713//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3713//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12562986/hdfs-4351-4.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3713//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3713//console This message is automatically generated.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        Hi Andrew, could you also post a patch for branch-1?

        Show
        Tsz Wo Nicholas Sze added a comment - Hi Andrew, could you also post a patch for branch-1?
        Hide
        Andrew Wang added a comment -

        Hi Nicholas, branch-1 patch attached.

        I couldn't run the added tests because my branch-1 environment is messed up, but maybe you could give it a spin for me.

        Show
        Andrew Wang added a comment - Hi Nicholas, branch-1 patch attached. I couldn't run the added tests because my branch-1 environment is messed up, but maybe you could give it a spin for me.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12563083/hdfs-4351-branch-1-1.patch
        against trunk revision .

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3717//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12563083/hdfs-4351-branch-1-1.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3717//console This message is automatically generated.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        > I couldn't run the added tests because my branch-1 environment is messed up, but maybe you could give it a spin for me.

        Sure, Jing or I will test the branch-1 patch.

        Show
        Tsz Wo Nicholas Sze added a comment - > I couldn't run the added tests because my branch-1 environment is messed up, but maybe you could give it a spin for me. Sure, Jing or I will test the branch-1 patch.
        Hide
        Andrew Wang added a comment -

        Just wanted to note that I fixed my branch-1 env and the test passes.

        Show
        Andrew Wang added a comment - Just wanted to note that I fixed my branch-1 env and the test passes.
        Hide
        Jing Zhao added a comment -

        I also run test-patch for Andrew, and the result looks good:

        -1 overall.  
        
            +1 @author.  The patch does not contain any @author tags.
        
            +1 tests included.  The patch appears to include 3 new or modified tests.
        
            +1 javadoc.  The javadoc tool did not generate any warning messages.
        
            +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
        
            -1 findbugs.  The patch appears to introduce 225 new Findbugs (version 2.0.1) warnings.
        
        Show
        Jing Zhao added a comment - I also run test-patch for Andrew, and the result looks good: -1 overall. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 225 new Findbugs (version 2.0.1) warnings.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        I have committed this. Thanks, Andrew!

        Also thanks Jing for reviewing and testing the patches.

        Show
        Tsz Wo Nicholas Sze added a comment - I have committed this. Thanks, Andrew! Also thanks Jing for reviewing and testing the patches.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-trunk-Commit #3181 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3181/)
        HDFS-4351. In BlockPlacementPolicyDefault.chooseTarget(..), numOfReplicas needs to be updated when avoiding stale nodes. Contributed by Andrew Wang (Revision 1429653)

        Result = SUCCESS
        szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1429653
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java
        Show
        Hudson added a comment - Integrated in Hadoop-trunk-Commit #3181 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3181/ ) HDFS-4351 . In BlockPlacementPolicyDefault.chooseTarget(..), numOfReplicas needs to be updated when avoiding stale nodes. Contributed by Andrew Wang (Revision 1429653) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1429653 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Yarn-trunk #89 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/89/)
        HDFS-4351. In BlockPlacementPolicyDefault.chooseTarget(..), numOfReplicas needs to be updated when avoiding stale nodes. Contributed by Andrew Wang (Revision 1429653)

        Result = SUCCESS
        szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1429653
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java
        Show
        Hudson added a comment - Integrated in Hadoop-Yarn-trunk #89 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/89/ ) HDFS-4351 . In BlockPlacementPolicyDefault.chooseTarget(..), numOfReplicas needs to be updated when avoiding stale nodes. Contributed by Andrew Wang (Revision 1429653) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1429653 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #1278 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1278/)
        HDFS-4351. In BlockPlacementPolicyDefault.chooseTarget(..), numOfReplicas needs to be updated when avoiding stale nodes. Contributed by Andrew Wang (Revision 1429653)

        Result = FAILURE
        szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1429653
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1278 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1278/ ) HDFS-4351 . In BlockPlacementPolicyDefault.chooseTarget(..), numOfReplicas needs to be updated when avoiding stale nodes. Contributed by Andrew Wang (Revision 1429653) Result = FAILURE szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1429653 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1307 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1307/)
        HDFS-4351. In BlockPlacementPolicyDefault.chooseTarget(..), numOfReplicas needs to be updated when avoiding stale nodes. Contributed by Andrew Wang (Revision 1429653)

        Result = SUCCESS
        szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1429653
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1307 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1307/ ) HDFS-4351 . In BlockPlacementPolicyDefault.chooseTarget(..), numOfReplicas needs to be updated when avoiding stale nodes. Contributed by Andrew Wang (Revision 1429653) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1429653 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java

          People

          • Assignee:
            Andrew Wang
            Reporter:
            Andrew Wang
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development