Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9761

Rebalancer sleeps too long between iterations

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: balancer & mover
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Whilst reading the code to try and determine why I'm seeing bad performance I think I've spotted an inadvertent move of a sleep() to inside a for loop.

      This means that instead of sleeping once per iteration the balance is sleeping once per namenode per iteration. If I've done my maths correctly the sleep time default is 18seconds so this is a nice little speedup for HA clusters.

      Commit d31a41c35927f02f2fb40d19380b5df4bb2b6d57 moved the sleep in org/apache/hadoop/hdfs/server/balancer/Balancer.java:run()

      1. HDFS-9761.000.patch
        1.0 kB
        Mingliang Liu

        Issue Links

          Activity

          Hide
          cnauroth Chris Nauroth added a comment -

          Adrian Bridgett, yeah, I'm hooked! Please do feel free to file more bug reports if you find other problems. We welcome contributions. If you get the urge to patch something, here are instructions on how the contribution process works.

          https://wiki.apache.org/hadoop/HowToContribute

          Show
          cnauroth Chris Nauroth added a comment - Adrian Bridgett , yeah, I'm hooked! Please do feel free to file more bug reports if you find other problems. We welcome contributions. If you get the urge to patch something, here are instructions on how the contribution process works. https://wiki.apache.org/hadoop/HowToContribute
          Hide
          abridgett Adrian Bridgett added a comment -

          You're most welcome. Really impressive JIRA/hudson setup - and especially how well and quickly you all communicate. Awesome to see in action

          Show
          abridgett Adrian Bridgett added a comment - You're most welcome. Really impressive JIRA/hudson setup - and especially how well and quickly you all communicate. Awesome to see in action
          Hide
          liuml07 Mingliang Liu added a comment -

          Thanks Chris Trezzo for review and Chris Nauroth for review and commit.

          Show
          liuml07 Mingliang Liu added a comment - Thanks Chris Trezzo for review and Chris Nauroth for review and commit.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #9251 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9251/)
          HDFS-9761. Rebalancer sleeps too long between iterations. Contributed by (cnauroth: rev c6497949e866594050153b953a85c0a1db59d2f8)

          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #9251 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9251/ ) HDFS-9761 . Rebalancer sleeps too long between iterations. Contributed by (cnauroth: rev c6497949e866594050153b953a85c0a1db59d2f8) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
          Hide
          cnauroth Chris Nauroth added a comment -

          I have committed this to trunk, branch-2 and branch-2.8. Mingliang Liu and Chris Trezzo, thank you for helping to resolve this quickly before the 2.8.0 release candidate. Adrian Bridgett, thank you for the bug report.

          Show
          cnauroth Chris Nauroth added a comment - I have committed this to trunk, branch-2 and branch-2.8. Mingliang Liu and Chris Trezzo , thank you for helping to resolve this quickly before the 2.8.0 release candidate. Adrian Bridgett , thank you for the bug report.
          Hide
          liuml07 Mingliang Liu added a comment -

          hadoop.hdfs.TestRollingUpgrade seems flaky and is tracked by HDFS-9664. Other failing tests seem unrelated so I filed individual jiras.

          • TestBlockScanner is tracked by HDFS-9765
          • TestDataNodeMetrics#testDataNodeTimeSpend by HDFS-9766
          • TestFileAppend#testMultipleAppends by HDFS-9767
          Show
          liuml07 Mingliang Liu added a comment - hadoop.hdfs.TestRollingUpgrade seems flaky and is tracked by HDFS-9664 . Other failing tests seem unrelated so I filed individual jiras. TestBlockScanner is tracked by HDFS-9765 TestDataNodeMetrics#testDataNodeTimeSpend by HDFS-9766 TestFileAppend#testMultipleAppends by HDFS-9767
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 21s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          0 mvndep 0m 9s Maven dependency ordering for branch
          +1 mvninstall 6m 51s trunk passed
          +1 compile 0m 50s trunk passed with JDK v1.8.0_66
          +1 compile 0m 40s trunk passed with JDK v1.7.0_91
          +1 checkstyle 0m 26s trunk passed
          +1 mvnsite 0m 51s trunk passed
          +1 mvneclipse 0m 12s trunk passed
          +1 findbugs 1m 53s trunk passed
          +1 javadoc 1m 8s trunk passed with JDK v1.8.0_66
          +1 javadoc 1m 49s trunk passed with JDK v1.7.0_91
          0 mvndep 0m 7s Maven dependency ordering for patch
          +1 mvninstall 0m 45s the patch passed
          +1 compile 0m 44s the patch passed with JDK v1.8.0_66
          +1 javac 0m 44s the patch passed
          +1 compile 0m 39s the patch passed with JDK v1.7.0_91
          +1 javac 0m 39s the patch passed
          +1 checkstyle 0m 18s the patch passed
          +1 mvnsite 0m 50s the patch passed
          +1 mvneclipse 0m 11s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 2m 3s the patch passed
          +1 javadoc 1m 4s the patch passed with JDK v1.8.0_66
          +1 javadoc 1m 45s the patch passed with JDK v1.7.0_91
          -1 unit 62m 53s hadoop-hdfs in the patch failed with JDK v1.8.0_66.
          -1 unit 57m 24s hadoop-hdfs in the patch failed with JDK v1.7.0_91.
          +1 asflicense 0m 20s Patch does not generate ASF License warnings.
          146m 20s



          Reason Tests
          JDK v1.8.0_66 Failed junit tests hadoop.hdfs.TestRollingUpgrade
            hadoop.hdfs.TestFileAppend
            hadoop.hdfs.server.datanode.TestDataNodeMetrics
            hadoop.hdfs.server.datanode.TestBlockScanner
          JDK v1.7.0_91 Failed junit tests hadoop.hdfs.TestFileAppend



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12786351/HDFS-9761.000.patch
          JIRA Issue HDFS-9761
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux afff96011985 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 1bcfab8
          Default Java 1.7.0_91
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14392/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14392/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
          unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14392/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14392/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
          JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14392/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Max memory used 77MB
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14392/console
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 21s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. 0 mvndep 0m 9s Maven dependency ordering for branch +1 mvninstall 6m 51s trunk passed +1 compile 0m 50s trunk passed with JDK v1.8.0_66 +1 compile 0m 40s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 26s trunk passed +1 mvnsite 0m 51s trunk passed +1 mvneclipse 0m 12s trunk passed +1 findbugs 1m 53s trunk passed +1 javadoc 1m 8s trunk passed with JDK v1.8.0_66 +1 javadoc 1m 49s trunk passed with JDK v1.7.0_91 0 mvndep 0m 7s Maven dependency ordering for patch +1 mvninstall 0m 45s the patch passed +1 compile 0m 44s the patch passed with JDK v1.8.0_66 +1 javac 0m 44s the patch passed +1 compile 0m 39s the patch passed with JDK v1.7.0_91 +1 javac 0m 39s the patch passed +1 checkstyle 0m 18s the patch passed +1 mvnsite 0m 50s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 3s the patch passed +1 javadoc 1m 4s the patch passed with JDK v1.8.0_66 +1 javadoc 1m 45s the patch passed with JDK v1.7.0_91 -1 unit 62m 53s hadoop-hdfs in the patch failed with JDK v1.8.0_66. -1 unit 57m 24s hadoop-hdfs in the patch failed with JDK v1.7.0_91. +1 asflicense 0m 20s Patch does not generate ASF License warnings. 146m 20s Reason Tests JDK v1.8.0_66 Failed junit tests hadoop.hdfs.TestRollingUpgrade   hadoop.hdfs.TestFileAppend   hadoop.hdfs.server.datanode.TestDataNodeMetrics   hadoop.hdfs.server.datanode.TestBlockScanner JDK v1.7.0_91 Failed junit tests hadoop.hdfs.TestFileAppend Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12786351/HDFS-9761.000.patch JIRA Issue HDFS-9761 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux afff96011985 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 1bcfab8 Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/14392/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14392/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14392/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14392/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14392/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Max memory used 77MB Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14392/console Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          cnauroth Chris Nauroth added a comment -

          +1 for the patch, pending pre-commit.

          Show
          cnauroth Chris Nauroth added a comment - +1 for the patch, pending pre-commit.
          Hide
          ctrezzo Chris Trezzo added a comment -

          +1. The patch looks good to me. I agree, I think the intent was to have the sleep call outside of the for loop.

          Show
          ctrezzo Chris Trezzo added a comment - +1. The patch looks good to me. I agree, I think the intent was to have the sleep call outside of the for loop.
          Hide
          ctrezzo Chris Trezzo added a comment -

          Chris Nauroth will take a look. Thanks!

          Show
          ctrezzo Chris Trezzo added a comment - Chris Nauroth will take a look. Thanks!
          Hide
          cnauroth Chris Nauroth added a comment -

          I think we need to raise this to blocker status for release 2.8.0, considering that the balancer is supposed to get faster in this version after patches like HDFS-8818 and HDFS-8824.

          Chris Trezzo and Ming Ma, would you mind reviewing this bug report since you had done the work on HDFS-8890?

          Show
          cnauroth Chris Nauroth added a comment - I think we need to raise this to blocker status for release 2.8.0, considering that the balancer is supposed to get faster in this version after patches like HDFS-8818 and HDFS-8824 . Chris Trezzo and Ming Ma , would you mind reviewing this bug report since you had done the work on HDFS-8890 ?
          Hide
          liuml07 Mingliang Liu added a comment -

          Thanks for reporting this, Adrian Bridgett

          Seems the sleep() code change is indeed inadvertent. I see no obvious reason why the HDFS-8890 should move the sleep() into a loop.

          Show
          liuml07 Mingliang Liu added a comment - Thanks for reporting this, Adrian Bridgett Seems the sleep() code change is indeed inadvertent. I see no obvious reason why the HDFS-8890 should move the sleep() into a loop.

            People

            • Assignee:
              liuml07 Mingliang Liu
              Reporter:
              abridgett Adrian Bridgett
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development