When a Dispatcher thread works on a source DataNode, in each iteration it tries to execute a PendingMove. If no block is moved after 5 iterations, this source (over-utlized) DataNode is given up for this Balancer iteration (20 mins). This is problematic if the source DataNode was heavily loaded in the beginning of the iteration. It will quickly encounter 5 unsuccessful moves and be abandoned.
We should enhance this logic by e.g. using elapsed time instead of number of iterations.