Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10966

Enhance Dispatcher logic on deciding when to give up a source DataNode

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      When a Dispatcher thread works on a source DataNode, in each iteration it tries to execute a PendingMove. If no block is moved after 5 iterations, this source (over-utlized) DataNode is given up for this Balancer iteration (20 mins). This is problematic if the source DataNode was heavily loaded in the beginning of the iteration. It will quickly encounter 5 unsuccessful moves and be abandoned.

      We should enhance this logic by e.g. using elapsed time instead of number of iterations.

      // Check if the previous move was successful
              } else {
                // source node cannot find a pending block to move, iteration +1
                noPendingMoveIteration++;
                // in case no blocks can be moved for source node's task,
                // jump out of while-loop after 5 iterations.
                if (noPendingMoveIteration >= MAX_NO_PENDING_MOVE_ITERATIONS) {
                  LOG.info("Failed to find a pending move "  + noPendingMoveIteration
                      + " times.  Skipping " + this);
                  resetScheduledSize();
                }
              }
      

      Attachments

        1. HDFS-10966-branch-2.7.00.patch
          12 kB
          Zhe Zhang
        2. HDFS-10966.01.patch
          11 kB
          Zhe Zhang
        3. HDFS-10966.00.patch
          9 kB
          Zhe Zhang

        Issue Links

          Activity

            People

              mwagner Mark Wagner
              zhz Zhe Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: