Balancer moves blocks between Datanode(Ver. <2.6 ).
Balancer moves blocks between StorageGroups ( introduced by
HDFS-6584) , in the new version(Ver. >=2.6) .
is flawed, may causes 2 replicas ends in same node after running balance.
We have 2 nodes. Each node has two storages.
We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK).
We have a block with ONE_SSD storage policy.
The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK).
Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer.
Otherwise DN1 has 2 replicas.
UPDATE(Thanks Tsz Wo Nicholas Sze for pointing it out):
This bug will NOT causes 2 replicas end in same node after running balance, thanks to Datanode rejecting it.
We see a lot of ERROR when running test.
The Balancer runs 5~20 times iterations in the test, before it exits.
Balancer should not schedule it in the first place, even though it'll failed anyway. In the test, it should exit after 5 times iteration.