Rather than creating fixed thread pools which will be idle as cluster size increases, perhaps cached thread pools that spawn dynamically would help.
The previous balancer was easy to configure. I don't fully understand the previous design but a simpler approach that achieves the same improvement would be returning to a single fixed thread pool - with intelligent queuing of work. Ie. interleaving work for all targets, with a max queued limit, so replications are distributed evenly across nodes. I'm assuming it didn't do that.
Do you have
HDFS-8824 in your runs? I suspect the first run has it but the second one does not.
over time older nodes will end up with only small blocks, if it is set permanently? It will look good for quick balancing, but may not be good in long term
Exactly. We had to disable the feature because nodes become concentrated with small blocks. getBlocks becomes increasing expensive as it searches for a dwindling number of large blocks on unbalanced nodes. The client load increases on those nodes due to block volume. Eventually the balancer just plays a shell game moving the larger blocks.
The current balancer probably works great when adding nodes, but not as a continuous service. If not reverted, something has to be done to restore previous steady state performance.