Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-3945

Load balancer shouldn't move the same region in two consective balancing actions

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Keeping a region on the same region server would give good stability for active scanners.
      We shouldn't reassign the same region in two successive calls to balanceCluster().

        Activity

        Hide
        stack stack added a comment -

        No progress in years. Resolving.

        Show
        stack stack added a comment - No progress in years. Resolving.
        Hide
        yuzhihong@gmail.com Ted Yu added a comment -

        Let's keep this open for a while.
        At least I expect some unit test what can tell us such consecutive moves wouldn't happen.

        Show
        yuzhihong@gmail.com Ted Yu added a comment - Let's keep this open for a while. At least I expect some unit test what can tell us such consecutive moves wouldn't happen.
        Hide
        lhofhansl Lars Hofhansl added a comment -

        @Ted: Do you want to keep this open?

        Show
        lhofhansl Lars Hofhansl added a comment - @Ted: Do you want to keep this open?
        Hide
        yuzhihong@gmail.com Ted Yu added a comment -

        Without new parameter, we can keep any region on its region server for at least two cycles of balancing.
        This somehow relates hbase.balancer.period with the expected duration of compaction(s).

        Or maybe I misinterpreted Stack's comment.

        Show
        yuzhihong@gmail.com Ted Yu added a comment - Without new parameter, we can keep any region on its region server for at least two cycles of balancing. This somehow relates hbase.balancer.period with the expected duration of compaction(s). Or maybe I misinterpreted Stack's comment.
        Hide
        yuzhihong@gmail.com Ted Yu added a comment -

        @Jonathan:
        I agree with your comment.
        I think we should use as few knobs as possible.

        Stack suggested this approach for the problem reported by Schubert and Anty. See his comment in HBASE-3943.

        Show
        yuzhihong@gmail.com Ted Yu added a comment - @Jonathan: I agree with your comment. I think we should use as few knobs as possible. Stack suggested this approach for the problem reported by Schubert and Anty. See his comment in HBASE-3943 .
        Hide
        streamy Jonathan Gray added a comment -

        I worry about this approach of more and more knobs, especially when they don't directly address what a good/bad load balance really is.

        If a region gets moved in two consecutive balancing actions, then something is wrong with the balancer in the first place. While I agree in principle that regions moving multiple times and quickly is not desirable, this will be a common outcome if the balancing algorithm isn't already taking into account metrics over time (rather than short snapshots). If we're using load but then adding all these limits/controls, it's hard to ever understand the behavior of the balancer.

        Show
        streamy Jonathan Gray added a comment - I worry about this approach of more and more knobs, especially when they don't directly address what a good/bad load balance really is. If a region gets moved in two consecutive balancing actions, then something is wrong with the balancer in the first place. While I agree in principle that regions moving multiple times and quickly is not desirable, this will be a common outcome if the balancing algorithm isn't already taking into account metrics over time (rather than short snapshots). If we're using load but then adding all these limits/controls, it's hard to ever understand the behavior of the balancer.
        Hide
        yuzhihong@gmail.com Ted Yu added a comment -

        Motivation for this JIRA was to reduce disruption to long running compactions.
        Since the decision of compaction is solely made by region server, it is not easy for load balancer to know the exact timing and duration of compactions.
        Shall we introduce new parameter, e.g. hbase.balancer.inert.duration, specifying the duration of keeping region on the same region server ?

        Show
        yuzhihong@gmail.com Ted Yu added a comment - Motivation for this JIRA was to reduce disruption to long running compactions. Since the decision of compaction is solely made by region server, it is not easy for load balancer to know the exact timing and duration of compactions. Shall we introduce new parameter, e.g. hbase.balancer.inert.duration, specifying the duration of keeping region on the same region server ?

          People

          • Assignee:
            Unassigned
            Reporter:
            yuzhihong@gmail.com Ted Yu
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development