Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-6621

Hadoop Balancer prematurely exits iterations

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.2.0, 2.4.0
    • 2.6.0
    • balancer & mover
    • Red Hat Enterprise Linux Server release 5.8 with Hadoop 2.4.0

    • balancer

    Description

      I have been having an issue with the balancing being too slow. The issue was not with the speed with which blocks were moved, but rather the balancer would prematurely exit out of it's balancing iterations. It would move ~10 blocks or 100 MB then exit the current iteration (in which it said it was planning on moving about 10 GB).

      I looked in the Balancer.java code and believe I found and solved the issue. In the dispatchBlocks() function there is a variable, "noPendingBlockIteration", which counts the number of iterations in which a pending block to move cannot be found. Once this number gets to 5, the balancer exits the overall balancing iteration. I believe the desired functionality is 5 consecutive no pending block iterations - however this variable is never reset to 0 upon block moves. So once this number reaches 5 - even if there have been thousands of blocks moved in between these no pending block iterations - the overall balancing iteration will prematurely end.

      The fix I applied was to set noPendingBlockIteration = 0 when a pending block is found and scheduled. In this way, my iterations do not prematurely exit unless there is 5 consecutive no pending block iterations. Below is a copy of my dispatchBlocks() function with the change I made.

          private void dispatchBlocks() {
            long startTime = Time.now();
            long scheduledSize = getScheduledSize();
            this.blocksToReceive = 2*scheduledSize;
            boolean isTimeUp = false;
            int noPendingBlockIteration = 0;
            while(!isTimeUp && getScheduledSize()>0 &&
                (!srcBlockList.isEmpty() || blocksToReceive>0)) {
              PendingBlockMove pendingBlock = chooseNextBlockToMove();
              if (pendingBlock != null) {
                noPendingBlockIteration = 0;
                // move the block
                pendingBlock.scheduleBlockMove();
                continue;
              }
      
              /* Since we can not schedule any block to move,
               * filter any moved blocks from the source block list and
               * check if we should fetch more blocks from the namenode
               */
              filterMovedBlocks(); // filter already moved blocks
              if (shouldFetchMoreBlocks()) {
                // fetch new blocks
                try {
                  blocksToReceive -= getBlockList();
                  continue;
                } catch (IOException e) {
                  LOG.warn("Exception while getting block list", e);
                  return;
                }
              } else {
                // source node cannot find a pendingBlockToMove, iteration +1
                noPendingBlockIteration++;
                // in case no blocks can be moved for source node's task,
                // jump out of while-loop after 5 iterations.
                if (noPendingBlockIteration >= MAX_NO_PENDING_BLOCK_ITERATIONS) {
                  setScheduledSize(0);
                }
              }
      
              // check if time is up or not
              if (Time.now()-startTime > MAX_ITERATION_TIME) {
                isTimeUp = true;
                continue;
              }
      
              /* Now we can not schedule any block to move and there are
               * no new blocks added to the source block list, so we wait.
               */
              try {
                synchronized(Balancer.this) {
                  Balancer.this.wait(1000);  // wait for targets/sources to be idle
                }
              } catch (InterruptedException ignored) {
              }
            }
          }
        }
      

      Attachments

        1. HDFS-6621_problem1.patch
          0.8 kB
          Rafal Wojdyla
        2. HDFS-6621.patch
          0.7 kB
          Benjamin Bowman
        3. HDFS-6621.patch_2
          1 kB
          Rafal Wojdyla
        4. HDFS-6621.patch_3
          2 kB
          Rafal Wojdyla
        5. HDFS-6621.patch_4
          2 kB
          Rafal Wojdyla

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            ravwojdyla Rafal Wojdyla
            bbowman410 Benjamin Bowman
            Votes:
            0 Vote for this issue
            Watchers:
            14 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment