Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-2012

Recurring failure of TestBalancer due to incorrect treatment of nodes whose utilization equals avgUtilization.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.22.0, 0.24.0
    • Component/s: balancer, test
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      This has been failing on Hudson for the last two builds and fails on my local box as well.

      1. HDFS-2012-Trunk.patch
        1.0 kB
        Uma Maheswara Rao G
      2. HDFS-2012-0.22Branch.patch
        0.9 kB
        Uma Maheswara Rao G
      3. HDFS-2012.patch
        3 kB
        Uma Maheswara Rao G
      4. TestBalancerLog.html
        261 kB
        Konstantin Shvachko

        Issue Links

          Activity

          Hide
          Uma Maheswara Rao G added a comment -

          Here is the Jira for handling it.
          HDFS-2491

          Show
          Uma Maheswara Rao G added a comment - Here is the Jira for handling it. HDFS-2491
          Hide
          Konstantin Shvachko added a comment -

          Yes, we missed it. Open another jira, please.

          Show
          Konstantin Shvachko added a comment - Yes, we missed it. Open another jira, please.
          Hide
          Uma Maheswara Rao G added a comment -

          We need to handle one more condition i think,

           if (getUtilization(datanode) > avgUtilization) {
                  datanodeS = new Source(datanode, avgUtilization, threshold);
                  if (isAboveAvgUtilized(datanodeS)) {
          

          we handled only in isAboveAvgUtilized method to add the DN when dn utilization is equal to avgUtilization.
          But that check is happening before as well.

          Show
          Uma Maheswara Rao G added a comment - We need to handle one more condition i think, if (getUtilization(datanode) > avgUtilization) { datanodeS = new Source(datanode, avgUtilization, threshold); if (isAboveAvgUtilized(datanodeS)) { we handled only in isAboveAvgUtilized method to add the DN when dn utilization is equal to avgUtilization. But that check is happening before as well.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #831 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/831/)
          HDFS-2012. Balancer incorrectly treats nodes whose utilization equals avgUtilization. Contributed by Uma Maheswara Rao G.

          shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1183457
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #831 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/831/ ) HDFS-2012 . Balancer incorrectly treats nodes whose utilization equals avgUtilization. Contributed by Uma Maheswara Rao G. shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1183457 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #861 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/861/)
          HDFS-2012. Balancer incorrectly treats nodes whose utilization equals avgUtilization. Contributed by Uma Maheswara Rao G.

          shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1183457
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #861 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/861/ ) HDFS-2012 . Balancer incorrectly treats nodes whose utilization equals avgUtilization. Contributed by Uma Maheswara Rao G. shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1183457 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-22-branch #98 (See https://builds.apache.org/job/Hadoop-Hdfs-22-branch/98/)
          HDFS-2012. Balancer incorrectly treats nodes whose utilization equals avgUtilization. Contributed by Uma Maheswara Rao G.

          shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1183450
          Files :

          • /hadoop/common/branches/branch-0.22/hdfs/CHANGES.txt
          • /hadoop/common/branches/branch-0.22/hdfs/src/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-22-branch #98 (See https://builds.apache.org/job/Hadoop-Hdfs-22-branch/98/ ) HDFS-2012 . Balancer incorrectly treats nodes whose utilization equals avgUtilization. Contributed by Uma Maheswara Rao G. shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1183450 Files : /hadoop/common/branches/branch-0.22/hdfs/CHANGES.txt /hadoop/common/branches/branch-0.22/hdfs/src/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #1102 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1102/)
          HDFS-2012. Balancer incorrectly treats nodes whose utilization equals avgUtilization. Contributed by Uma Maheswara Rao G.

          shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1183457
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #1102 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1102/ ) HDFS-2012 . Balancer incorrectly treats nodes whose utilization equals avgUtilization. Contributed by Uma Maheswara Rao G. shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1183457 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #1161 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1161/)
          HDFS-2012. Balancer incorrectly treats nodes whose utilization equals avgUtilization. Contributed by Uma Maheswara Rao G.

          shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1183457
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #1161 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1161/ ) HDFS-2012 . Balancer incorrectly treats nodes whose utilization equals avgUtilization. Contributed by Uma Maheswara Rao G. shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1183457 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #1083 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1083/)
          HDFS-2012. Balancer incorrectly treats nodes whose utilization equals avgUtilization. Contributed by Uma Maheswara Rao G.

          shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1183457
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #1083 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1083/ ) HDFS-2012 . Balancer incorrectly treats nodes whose utilization equals avgUtilization. Contributed by Uma Maheswara Rao G. shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1183457 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
          Hide
          Konstantin Shvachko added a comment -

          I just committed this. Thank you Uma.

          Show
          Konstantin Shvachko added a comment - I just committed this. Thank you Uma.
          Hide
          Uma Maheswara Rao G added a comment -

          Hi Konstantin,

          Thanks a lot for Review!
          Test failures are not related to this changes. They are mainly related to protocol changes and are not related to Balancer.Without the patch also they are failing.

          Thanks
          Uma

          Show
          Uma Maheswara Rao G added a comment - Hi Konstantin, Thanks a lot for Review! Test failures are not related to this changes. They are mainly related to protocol changes and are not related to Balancer.Without the patch also they are failing. Thanks Uma
          Hide
          Konstantin Shvachko added a comment -

          +1 for the patches.
          Are the test failures related to the patch?

          Show
          Konstantin Shvachko added a comment - +1 for the patches. Are the test failures related to the patch?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12498870/HDFS-2012-Trunk.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hdfs.TestRestartDFS
          org.apache.hadoop.hdfs.TestFileCreationClient
          org.apache.hadoop.hdfs.TestSetrepIncreasing
          org.apache.hadoop.hdfs.TestDistributedFileSystem
          org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1364//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1364//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12498870/HDFS-2012-Trunk.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.TestRestartDFS org.apache.hadoop.hdfs.TestFileCreationClient org.apache.hadoop.hdfs.TestSetrepIncreasing org.apache.hadoop.hdfs.TestDistributedFileSystem org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1364//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1364//console This message is automatically generated.
          Hide
          Uma Maheswara Rao G added a comment -

          Konstantin, Thanks a lot for taking a look!

          Why i have changed it is, mainly for readability and meaning with new change. Considering your concerns, i will keep the same old names.

          By looking at the code, this will be appliable for trunk as well. I will prepare the patches for both.

          Thanks
          Uma

          Show
          Uma Maheswara Rao G added a comment - Konstantin, Thanks a lot for taking a look! Why i have changed it is, mainly for readability and meaning with new change. Considering your concerns, i will keep the same old names. By looking at the code, this will be appliable for trunk as well. I will prepare the patches for both. Thanks Uma
          Hide
          Konstantin Shvachko added a comment -

          The patch looks good. Could you please avoid renaming the method and the variable. Keeping the same names, just changing the impl and the comment, will help porting between versions.
          Speaking of versions, could you please check if trunk has the same problem?

          Show
          Konstantin Shvachko added a comment - The patch looks good. Could you please avoid renaming the method and the variable. Keeping the same names, just changing the impl and the comment, will help porting between versions. Speaking of versions, could you please check if trunk has the same problem?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12498452/HDFS-2012.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1357//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12498452/HDFS-2012.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1357//console This message is automatically generated.
          Hide
          Konstantin Shvachko added a comment -

          In Balancer.initNodes(.) there is logic that classifies nodes 4 groups:

          1. underUtilied, whose utilization is less than avgUtilization - threshold
          2. utilized below average, whose utilization is < avgUtilization, but >= (avgUtilization-threshold)
          3. utilized above average, whose utilization is > avgUtilization, but <= (avgUtilization+threshold)
          4. overUtilied, whose utilization is > avgUtilization + threshold

          The problem here is that if the node's utilization exactly = avgUtilization, then it does not belong to any of the four groups and the assert kicks in.
          The fix is to include such node to group 2 or 3. I'd prefer 3.

          Show
          Konstantin Shvachko added a comment - In Balancer.initNodes(.) there is logic that classifies nodes 4 groups: underUtilied, whose utilization is less than avgUtilization - threshold utilized below average, whose utilization is < avgUtilization, but >= (avgUtilization-threshold) utilized above average, whose utilization is > avgUtilization, but <= (avgUtilization+threshold) overUtilied, whose utilization is > avgUtilization + threshold The problem here is that if the node's utilization exactly = avgUtilization, then it does not belong to any of the four groups and the assert kicks in. The fix is to include such node to group 2 or 3. I'd prefer 3.
          Hide
          Konstantin Shvachko added a comment -

          Here is the history of failures
          The reason is an assert in the test:

          junit.framework.AssertionFailedError: 127.0.0.1:53207is not an underUtilized node
          	at org.apache.hadoop.hdfs.server.balancer.Balancer.initNodes(Balancer.java:1011)
          	at org.apache.hadoop.hdfs.server.balancer.Balancer.initNodes(Balancer.java:953)
          	at org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:1496)
          	at org.apache.hadoop.hdfs.server.balancer.TestBalancer.runBalancer(TestBalancer.java:247)
          	at org.apache.hadoop.hdfs.server.balancer.TestBalancer.test(TestBalancer.java:234)
          	at org.apache.hadoop.hdfs.server.balancer.TestBalancer.oneNodeTest(TestBalancer.java:307)
          	at org.apache.hadoop.hdfs.server.balancer.TestBalancer.__CLR2_4_39j3j5b10o2(TestBalancer.java:327)
          	at org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancer0(TestBalancer.java:324)
          

          See the full run log attached.

          Show
          Konstantin Shvachko added a comment - Here is the history of failures The reason is an assert in the test: junit.framework.AssertionFailedError: 127.0.0.1:53207is not an underUtilized node at org.apache.hadoop.hdfs.server.balancer.Balancer.initNodes(Balancer.java:1011) at org.apache.hadoop.hdfs.server.balancer.Balancer.initNodes(Balancer.java:953) at org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:1496) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.runBalancer(TestBalancer.java:247) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.test(TestBalancer.java:234) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.oneNodeTest(TestBalancer.java:307) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.__CLR2_4_39j3j5b10o2(TestBalancer.java:327) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancer0(TestBalancer.java:324) See the full run log attached.
          Hide
          Konstantin Shvachko added a comment -

          I am reopening it now as Jenkins just failed and we got a clear log for the issue.

          Show
          Konstantin Shvachko added a comment - I am reopening it now as Jenkins just failed and we got a clear log for the issue.
          Hide
          Konstantin Shvachko added a comment -

          Don't see failures any more. Feel free to reopen if it fails for you.

          Show
          Konstantin Shvachko added a comment - Don't see failures any more. Feel free to reopen if it fails for you.
          Hide
          Uma Maheswara Rao G added a comment -

          Hi Konstantin/Aaron,
          It is passing in my local box.
          Now also this test failing in your local box ?

          Thanks
          Uma

          Show
          Uma Maheswara Rao G added a comment - Hi Konstantin/Aaron, It is passing in my local box. Now also this test failing in your local box ? Thanks Uma

            People

            • Assignee:
              Uma Maheswara Rao G
              Reporter:
              Aaron T. Myers
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development