Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-25697

StochasticBalancer improvement for large scale clusters

Add voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Balancer, master, UI
    • None

    Description

      Findings on a large scale cluster (100,000 regions on 300 nodes)

      • Balancer starts and stops before getting a plan
      • Adding new racks doesn’t trigger balancer
      • Balancer stops leaving some racks at 50% lower region counts
      • Regions for large tables don’t get evenly distributed
      • Observability is poor
      • Too many knobs makes tuning empirical and takes many experiments

      Improvements made and being made

      More proposals

      • minCostNeedBalance for each cost function instead of weights. We want to trigger balancing if any factor is out of balancer instead of trying to combine the factors in arbitrary weights. This makes operation and configuration much easier.
      • Simulated annealing to lower minCostNeedBalance periodically to unstuck the balancer from sub-optimum then gradually increase to keep the system stable. Also add cost of move as a counter measure for the decision https://opensourcelibs.com/lib/tempest
      • Orchestrated scheduling of compaction, normalizer and balancer
      • PID approach https://www.amazon.com/dp/1449361692/ref=rdr_ext_tmb

      Attachments

        Issue Links

        There are no Sub-Tasks for this issue.

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            claraxiong Clara Xiong

            Dates

              Created:
              Updated:

              Slack

                Issue deployment