Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-3799

Provide more accurate check for super underloaded region server in load balancer

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Later
    • Affects Version/s: 0.90.2
    • Fix Version/s: None
    • Component/s: mapreduce
    • Labels:
      None

      Description

      HBASE-3609 used simple check for region server which recently joined the cluster so that both young and old regions from other region servers are assigned to it.
      The check was too strict.
      1 or more region may be assigned to this server before load balancer performs rebalancing.
      The next time balancer runs, it wouldn't treat this server as seriously underloaded correctly and assign a lot of young regions to it.

      We can use threshold over the number of regions to avoid such issue.

        Issue Links

          Activity

          Hide
          stack stack added a comment -

          A general comment on balancing (that probably fits better elsewhere than as a comment on this issue) is that we need 'smoothing' of region move.... Yesterday we brought a regionserver back online into a smallish cluster that was under load and the balance run unloaded a bunch of regions all in the one go which put a dent in the throughput; it'd be sweet if the balancer ran at an appropriate 'rate'. When under load, it should move regions 'gently' rather than all as a big bang (the decommission script will move a region at a time, verifying it deployed in its new location before moving another... this can take ages to complete but its proven minimally disruptive to loadings)

          Show
          stack stack added a comment - A general comment on balancing (that probably fits better elsewhere than as a comment on this issue) is that we need 'smoothing' of region move.... Yesterday we brought a regionserver back online into a smallish cluster that was under load and the balance run unloaded a bunch of regions all in the one go which put a dent in the throughput; it'd be sweet if the balancer ran at an appropriate 'rate'. When under load, it should move regions 'gently' rather than all as a big bang (the decommission script will move a region at a time, verifying it deployed in its new location before moving another... this can take ages to complete but its proven minimally disruptive to loadings)
          Hide
          yuzhihong@gmail.com Ted Yu added a comment -

          We can limit the number (percentage) of regions that are moved off a single region server.
          Should that be part of this JIRA ?

          Show
          yuzhihong@gmail.com Ted Yu added a comment - We can limit the number (percentage) of regions that are moved off a single region server. Should that be part of this JIRA ?
          Hide
          stack stack added a comment -

          Shouldn't be part of this issue. I'll move my comment to a new issue.

          Show
          stack stack added a comment - Shouldn't be part of this issue. I'll move my comment to a new issue.
          Hide
          lhofhansl Lars Hofhansl added a comment -

          @Ted: Do you still want this?

          Show
          lhofhansl Lars Hofhansl added a comment - @Ted: Do you still want this?
          Hide
          yuzhihong@gmail.com Ted Yu added a comment -

          The notion of young vs. old regions can be more accurately described with various metrics for read / write on regions.
          I will close this JIRA for now.

          Show
          yuzhihong@gmail.com Ted Yu added a comment - The notion of young vs. old regions can be more accurately described with various metrics for read / write on regions. I will close this JIRA for now.

            People

            • Assignee:
              yuzhihong@gmail.com Ted Yu
              Reporter:
              yuzhihong@gmail.com Ted Yu
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development