The default block placement policy will choose datanodes for new blocks randomly, which will result in unbalanced space used percent among datanodes after an cluster expansion. The old datanodes always are in high used percent of space and new added ones are in low percent.
Through we can used the external balance tool to balance the space used rate, it will cost extra network IO and it's not easy to control the balance speed.
An easy solution is to implement an balanced block placement policy which will choose low used percent datanodes for new blocks with a little high possibility. In a not long term, the used percent of datanodes will trend to be balanced.
Suggestions and discussions are welcomed. Thanks