[HBASE-25973] Balancer should explain progress in a better way in log - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.0.0-alpha-1
Fix Version/s: 2.5.0, 2.3.6, 3.0.0-alpha-2, 2.4.5
Component/s: Balancer
Labels:
None

Description

In the log, balancer logs at info level at the beginning of run:

balancer.StochasticLoadBalancer: start StochasticLoadBalancer.balancer, initCost=277.3479243125063, functionCost=RegionCountSkewCostFunction : (500.0, 0.3749771215224234); ServerLocalityCostFunction : (25.0, 0.5807483226644186); RackLocalityCostFunction : (15.0, 0.0); TableSkewCostFunction : (1000.0, 0.0019704142954972883); StoreFileCostFunction : (200.0, 0.3668512059459341);  computedMaxSteps: 42270438200

the cost is reported without context, it is hard for operator to understand how unbalanced the cluster is for balancer and how much progress we are making.

For a large cluster, the calculation can take a long time, we also need to let operator understand that it will take up to the max time to complete the calculation.

At the end of computation:

balancer.StochasticLoadBalancer: Finished computing new load balance plan. Computation took PT40M0.006S to try 1036409 different iterations. Found a solution that moves 161926 regions; Going from a computed cost of 118.75715593924485 to a new cost of 1.5509126920967042

The time to compute the plan is also printed in a format that is not human readable. we also need to let operator understand that balancer is just submitting the plan and it be up to execution to complete the move.