Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10716

In Balancer, the target task should be removed when its size < 0.

    Details

    • Hadoop Flags:
      Reviewed

      Description

      In HDFS-10602, we found a failing case that the balancer moves data always between 2 DNs. And it made the balancer can't be finished. I debug the code for this, I found there seems a bug in choosing pending blocks in Dispatcher.Source.chooseNextMove.

      The codes:

          private PendingMove chooseNextMove() {
            for (Iterator<Task> i = tasks.iterator(); i.hasNext();) {
              final Task task = i.next();
              final DDatanode target = task.target.getDDatanode();
              final PendingMove pendingBlock = new PendingMove(this, task.target);
              if (target.addPendingBlock(pendingBlock)) {
                // target is not busy, so do a tentative block allocation
                if (pendingBlock.chooseBlockAndProxy()) {
                  long blockSize = pendingBlock.reportedBlock.getNumBytes(this);
                  incScheduledSize(-blockSize);
                  task.size -= blockSize;
                  // If the size of bytes that need to be moved was first reduced to less than 0
                  // it should also be removed.
                  if (task.size == 0) {
                    i.remove();
                  }
                  return pendingBlock;
                  //...
      

      The value of task.size was assigned in Balancer#matchSourceWithTargetToMove

          long size = Math.min(source.availableSizeToMove(), target.availableSizeToMove());
          final Task task = new Task(target, size);
      

      This value was depended on the source and target node, and this value will not always can be reduced to 0 in choosing pending blocks. And then, it will still move the data to the target node even if the size of bytes that needed to move has been already reduced less than 0. And finally it will make the data imbalance again in cluster, then it leads the next balancer.

      We can opitimize for this as this title mentioned, I think this can speed the balancer.

      Can see the logs for failling case, or see the HDFS-10602.(Concentrating on the change record for the scheduled size of target node. That's my added info for debug, like this).

      2016-08-01 16:51:57,492 [pool-51-thread-1] INFO  balancer.Dispatcher (Dispatcher.java:chooseNextMove(799)) - TargetNode: 58794, bytes scheduled to move, after: -67, before: 33
      
      1. failing.log
        49 kB
        Yiqun Lin
      2. HDFS-10716.001.patch
        0.8 kB
        Yiqun Lin

        Issue Links

          Activity

          Hide
          linyiqun Yiqun Lin added a comment -

          Attach a initial patch, thanks for reviewing.

          Show
          linyiqun Yiqun Lin added a comment - Attach a initial patch, thanks for reviewing.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 1m 5s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 7m 25s trunk passed
          +1 compile 0m 52s trunk passed
          +1 checkstyle 0m 27s trunk passed
          +1 mvnsite 0m 56s trunk passed
          +1 mvneclipse 0m 14s trunk passed
          +1 findbugs 2m 5s trunk passed
          +1 javadoc 0m 59s trunk passed
          +1 mvninstall 0m 53s the patch passed
          +1 compile 0m 47s the patch passed
          +1 javac 0m 47s the patch passed
          +1 checkstyle 0m 25s the patch passed
          +1 mvnsite 0m 58s the patch passed
          +1 mvneclipse 0m 11s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 2m 15s the patch passed
          +1 javadoc 0m 58s the patch passed
          +1 unit 63m 20s hadoop-hdfs in the patch passed.
          +1 asflicense 0m 22s The patch does not generate ASF License warnings.
          85m 31s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12821604/HDFS-10716.001.patch
          JIRA Issue HDFS-10716
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 7b3f5bcd5a80 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 7fc70c6
          Default Java 1.8.0_101
          findbugs v3.0.0
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16289/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16289/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 1m 5s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 7m 25s trunk passed +1 compile 0m 52s trunk passed +1 checkstyle 0m 27s trunk passed +1 mvnsite 0m 56s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 2m 5s trunk passed +1 javadoc 0m 59s trunk passed +1 mvninstall 0m 53s the patch passed +1 compile 0m 47s the patch passed +1 javac 0m 47s the patch passed +1 checkstyle 0m 25s the patch passed +1 mvnsite 0m 58s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 2m 15s the patch passed +1 javadoc 0m 58s the patch passed +1 unit 63m 20s hadoop-hdfs in the patch passed. +1 asflicense 0m 22s The patch does not generate ASF License warnings. 85m 31s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12821604/HDFS-10716.001.patch JIRA Issue HDFS-10716 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 7b3f5bcd5a80 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 7fc70c6 Default Java 1.8.0_101 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16289/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16289/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          linyiqun Yiqun Lin added a comment -

          Ping Tsz Wo Nicholas Sze, can take a look for this jira, I think this can enhance the balancer. Looking forward to seeing you response, thanks.

          Show
          linyiqun Yiqun Lin added a comment - Ping Tsz Wo Nicholas Sze , can take a look for this jira, I think this can enhance the balancer. Looking forward to seeing you response, thanks.
          Hide
          szetszwo Tsz Wo Nicholas Sze added a comment -

          +1 patch looks good. Thanks for fixing the bug!

          Show
          szetszwo Tsz Wo Nicholas Sze added a comment - +1 patch looks good. Thanks for fixing the bug!
          Hide
          szetszwo Tsz Wo Nicholas Sze added a comment -

          I have committed this. Thanks, Yiqun!

          Show
          szetszwo Tsz Wo Nicholas Sze added a comment - I have committed this. Thanks, Yiqun!
          Hide
          linyiqun Yiqun Lin added a comment -

          Thanks Tsz Wo Nicholas Sze for the quick review and commit!

          Show
          linyiqun Yiqun Lin added a comment - Thanks Tsz Wo Nicholas Sze for the quick review and commit!
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks Yiqun and Nicholas for the contribution.
          As a note for future search: the commit message does not have the jira number in it. (Code change is good.)

          Show
          xiaochen Xiao Chen added a comment - Thanks Yiqun and Nicholas for the contribution. As a note for future search: the commit message does not have the jira number in it. (Code change is good.)
          Hide
          zhz Zhe Zhang added a comment -

          I backported this to branch-2.7.

          Show
          zhz Zhe Zhang added a comment - I backported this to branch-2.7.
          Hide
          djp Junping Du added a comment -

          Sound like we were missing jira number in commit log.

          commit cefa21e98a12b06602ee8000f8cef6c3b17af999
          Author: Tsz-Wo Nicholas Sze <szetszwo@hortonworks.com>
          Date:   Thu Aug 4 09:45:40 2016 -0700
          
              In Balancer, the target task should be removed when its size < 0.  Contributed by Yiqun Lin
          
          Show
          djp Junping Du added a comment - Sound like we were missing jira number in commit log. commit cefa21e98a12b06602ee8000f8cef6c3b17af999 Author: Tsz-Wo Nicholas Sze <szetszwo@hortonworks.com> Date: Thu Aug 4 09:45:40 2016 -0700 In Balancer, the target task should be removed when its size < 0. Contributed by Yiqun Lin

            People

            • Assignee:
              linyiqun Yiqun Lin
              Reporter:
              linyiqun Yiqun Lin
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development