Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10966

Enhance Dispatcher logic on deciding when to give up a source DataNode

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      When a Dispatcher thread works on a source DataNode, in each iteration it tries to execute a PendingMove. If no block is moved after 5 iterations, this source (over-utlized) DataNode is given up for this Balancer iteration (20 mins). This is problematic if the source DataNode was heavily loaded in the beginning of the iteration. It will quickly encounter 5 unsuccessful moves and be abandoned.

      We should enhance this logic by e.g. using elapsed time instead of number of iterations.

      // Check if the previous move was successful
              } else {
                // source node cannot find a pending block to move, iteration +1
                noPendingMoveIteration++;
                // in case no blocks can be moved for source node's task,
                // jump out of while-loop after 5 iterations.
                if (noPendingMoveIteration >= MAX_NO_PENDING_MOVE_ITERATIONS) {
                  LOG.info("Failed to find a pending move "  + noPendingMoveIteration
                      + " times.  Skipping " + this);
                  resetScheduledSize();
                }
              }
      

      Attachments

        1. HDFS-10966.00.patch
          9 kB
          Zhe Zhang
        2. HDFS-10966.01.patch
          11 kB
          Zhe Zhang
        3. HDFS-10966-branch-2.7.00.patch
          12 kB
          Zhe Zhang

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            mwagner Mark Wagner
            zhz Zhe Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment