Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-25726

MoveCostFunction is not included in the list of cost functions for StochasticLoadBalancer

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0-alpha-1, 2.3.1, 1.7.0, 2.4.0
    • Fix Version/s: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.3, 2.3.6
    • Component/s: Balancer
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      After OffPeakHours fix for MoveCostFunction (HBASE-24709), MoveCostFunction is no longer included in costFunctions list. addCostFunction expects multiplier to be non-zero, but multiplier is now only set in cost function.

      As a result, hbase.master.balancer.stochastic.maxMovePercent is not respected, and there is no cost function to oppose a move. Any move that decreases total cost at all will be accepted, causing more churn and disruption from balancer executions.

      We noticed this when investigating a case where the balancer would run after a regionserver was restarted without use of region_mover script. The regionserver comes online with 0 regions, leading to a shortcut in needsBalance for idleRegionServerExist. The balancer runs to move regions to that newly restarted regionserver. However, it moves a large number of regions in the cluster, hyper-optimizing the other cost variables. There were ~4300 regions in the cluster at the time, so moving 25% of the regions should have had a final cost of at least 7 (default moveCostFunction weight.) MoveCostFunction is also not listed in the functions contributing to the initial cost.

      2021-03-30 15:47:43,396 INFO [49187_ChoreService_3] balancer.StochasticLoadBalancer - start StochasticLoadBalancer.balancer, initCost=12.91377229840024, functionCost=RegionCountSkewCostFunction : (500.0, 0.014878672009326464); TableSkewCostFunction : (35.0, 0.013600280177445717); RegionReplicaHostCostFunction : (100000.0, 0.0); RegionReplicaRackCostFunction : (10000.0, 0.0); ReadRequestCostFunction : (5.0, 0.8296332203204705); WriteRequestCostFunction : (5.0, 0.06818455421617946); MemstoreSizeCostFunction : (5.0, 0.08132131691669181); StoreFileCostFunction : (5.0, 0.02054620605193966); computedMaxSteps: 1000000

      2021-03-30 15:48:13,385 DEBUG [49187_ChoreService_3] balancer.StochasticLoadBalancer - Finished computing new load balance plan. Computation took 30004ms to try 6571 different iterations. Found a solution that moves 1095 regions; Going from a computed cost of 12.91377229840024 to a new cost of 4.804625730746651

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                dmanning David Manning
                Reporter:
                dmanning David Manning
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: