Details
Description
After OffPeakHours fix for MoveCostFunction (HBASE-24709), MoveCostFunction is no longer included in costFunctions list. addCostFunction expects multiplier to be non-zero, but multiplier is now only set in cost function.
As a result, hbase.master.balancer.stochastic.maxMovePercent is not respected, and there is no cost function to oppose a move. Any move that decreases total cost at all will be accepted, causing more churn and disruption from balancer executions.
We noticed this when investigating a case where the balancer would run after a regionserver was restarted without use of region_mover script. The regionserver comes online with 0 regions, leading to a shortcut in needsBalance for idleRegionServerExist. The balancer runs to move regions to that newly restarted regionserver. However, it moves a large number of regions in the cluster, hyper-optimizing the other cost variables. There were ~4300 regions in the cluster at the time, so moving 25% of the regions should have had a final cost of at least 7 (default moveCostFunction weight.) MoveCostFunction is also not listed in the functions contributing to the initial cost.
2021-03-30 15:47:43,396 INFO [49187_ChoreService_3] balancer.StochasticLoadBalancer - start StochasticLoadBalancer.balancer, initCost=12.91377229840024, functionCost=RegionCountSkewCostFunction : (500.0, 0.014878672009326464); TableSkewCostFunction : (35.0, 0.013600280177445717); RegionReplicaHostCostFunction : (100000.0, 0.0); RegionReplicaRackCostFunction : (10000.0, 0.0); ReadRequestCostFunction : (5.0, 0.8296332203204705); WriteRequestCostFunction : (5.0, 0.06818455421617946); MemstoreSizeCostFunction : (5.0, 0.08132131691669181); StoreFileCostFunction : (5.0, 0.02054620605193966); computedMaxSteps: 1000000
2021-03-30 15:48:13,385 DEBUG [49187_ChoreService_3] balancer.StochasticLoadBalancer - Finished computing new load balance plan. Computation took 30004ms to try 6571 different iterations. Found a solution that moves 1095 regions; Going from a computed cost of 12.91377229840024 to a new cost of 4.804625730746651
Attachments
Issue Links
- is caused by
-
HBASE-24709 Support MoveCostFunction use a lower multiplier in offpeak hours
- Resolved
- links to