Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10966

Enhance Dispatcher logic on deciding when to give up a source DataNode

    Details

    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      When a Dispatcher thread works on a source DataNode, in each iteration it tries to execute a PendingMove. If no block is moved after 5 iterations, this source (over-utlized) DataNode is given up for this Balancer iteration (20 mins). This is problematic if the source DataNode was heavily loaded in the beginning of the iteration. It will quickly encounter 5 unsuccessful moves and be abandoned.

      We should enhance this logic by e.g. using elapsed time instead of number of iterations.

      // Check if the previous move was successful
              } else {
                // source node cannot find a pending block to move, iteration +1
                noPendingMoveIteration++;
                // in case no blocks can be moved for source node's task,
                // jump out of while-loop after 5 iterations.
                if (noPendingMoveIteration >= MAX_NO_PENDING_MOVE_ITERATIONS) {
                  LOG.info("Failed to find a pending move "  + noPendingMoveIteration
                      + " times.  Skipping " + this);
                  resetScheduledSize();
                }
              }
      

        Attachments

        1. HDFS-10966.00.patch
          9 kB
          Zhe Zhang
        2. HDFS-10966.01.patch
          11 kB
          Zhe Zhang
        3. HDFS-10966-branch-2.7.00.patch
          12 kB
          Zhe Zhang

          Issue Links

            Activity

              People

              • Assignee:
                mwagner Mark Wagner
                Reporter:
                zhz Zhe Zhang
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: