Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-3779

Allow split regions to be placed on different region servers

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.90.2
    • Fix Version/s: None
    • Component/s: master
    • Labels:
      None

      Description

      Currently daughter regions are placed on the same region server where the parent region was.
      Stanislav Barton mentioned the idea that load information should be considered when placing the daughter regions.
      The rationale is that the daughter regions tend to receive more writes. So it would be beneficial to place at least one daughter region on a different region server.

        Activity

        Hide
        jdcryans Jean-Daniel Cryans added a comment - - edited

        This shouldn't be done inline with the split tho as it will take that whole key space offline for even longer. The way it's currently working is actually an optimization in that regard.

        What could be done tho is moving the region with the less load after the dust settles but even then, you might want a different behavior for the heavy import scenario and the realtime serving one. In the former, if the import is even across the regions, the region servers should all be splitting as much as the others so moving regions does no good.

        Show
        jdcryans Jean-Daniel Cryans added a comment - - edited This shouldn't be done inline with the split tho as it will take that whole key space offline for even longer. The way it's currently working is actually an optimization in that regard. What could be done tho is moving the region with the less load after the dust settles but even then, you might want a different behavior for the heavy import scenario and the realtime serving one. In the former, if the import is even across the regions, the region servers should all be splitting as much as the others so moving regions does no good.
        Hide
        zhoushuaifeng zhoushuaifeng added a comment -

        This is a problem shoud be considered. When one region is spliting, it's usually heavy loaded. So, the daughter region usually heavy loaded too. Case became even worse when the same situation happened on the daughters. If the daughter are all opened on the same server, this server may become overload and the performance of the system may become worse.
        It's better to consider the load information when placing the daughter regions. Simply, randomly assign the daughter regions is a good choice.

        Show
        zhoushuaifeng zhoushuaifeng added a comment - This is a problem shoud be considered. When one region is spliting, it's usually heavy loaded. So, the daughter region usually heavy loaded too. Case became even worse when the same situation happened on the daughters. If the daughter are all opened on the same server, this server may become overload and the performance of the system may become worse. It's better to consider the load information when placing the daughter regions. Simply, randomly assign the daughter regions is a good choice.
        Hide
        yuzhihong@gmail.com Ted Yu added a comment -

        I think we can introduce new config parameter, hbase.daughter.region.placement
        Suppose we name the current daughter placement policy as SAME_HOST and use it as default.

        We can add another policy, ONE_ON_LEAST_LOADED, where one of the daughters is placed on least loaded region server.
        AssignmentManager.getAssignments() can remember the most recent assignments.
        AssignmentManager.handleSplitReport() makes use of this information and assigned one daughter to least loaded server.

        Show
        yuzhihong@gmail.com Ted Yu added a comment - I think we can introduce new config parameter, hbase.daughter.region.placement Suppose we name the current daughter placement policy as SAME_HOST and use it as default. We can add another policy, ONE_ON_LEAST_LOADED, where one of the daughters is placed on least loaded region server. AssignmentManager.getAssignments() can remember the most recent assignments. AssignmentManager.handleSplitReport() makes use of this information and assigned one daughter to least loaded server.
        Hide
        sunnygao gaojinchao added a comment -

        I agree. It needs add a new config parameter and be beneficial to write heavily.

        Show
        sunnygao gaojinchao added a comment - I agree. It needs add a new config parameter and be beneficial to write heavily.
        Hide
        yuzhihong@gmail.com Ted Yu added a comment -

        First version introduces "hbase.daughter.region.placement" parameter.

        Show
        yuzhihong@gmail.com Ted Yu added a comment - First version introduces "hbase.daughter.region.placement" parameter.
        Hide
        msegel Michael Segel added a comment -

        Just a silly question... how do you determine the load on a region server? How does a regionserver track the loads of other region servers? How often is the load recalculated? Is it also a weighted load based on load over time?

        A random placement would be a start and maybe that's all that one needs... trying to calculate which region server to place a split on may be too costly.

        Also with HBASE-3586 - Improve the selection of regions to balance, wouldn't this kind of be redundant? I mean do a random transfer and then let HBase rebalance over time?

        Sorry to jump in at the end of this...

        Show
        msegel Michael Segel added a comment - Just a silly question... how do you determine the load on a region server? How does a regionserver track the loads of other region servers? How often is the load recalculated? Is it also a weighted load based on load over time? A random placement would be a start and maybe that's all that one needs... trying to calculate which region server to place a split on may be too costly. Also with HBASE-3586 - Improve the selection of regions to balance, wouldn't this kind of be redundant? I mean do a random transfer and then let HBase rebalance over time? Sorry to jump in at the end of this...
        Hide
        yuzhihong@gmail.com Ted Yu added a comment -

        Please refer to my blog http://zhihongyu.blogspot.com/2011/04/load-balancer-in-hbase-090.html
        One of the design guidelines I follow is to not introduce randomness where possible.

        The load is currently determined by the number of regions on the region server. There're plans for improving this measure toward weighted load based on load over time.
        Master runs balancer every "hbase.balancer.period" milliseconds, default 5 minutes.
        My patch latches onto the most recent run of balancer where the least loaded server was recorded. If ONE_ON_LEAST_LOADED policy is chosen, one daughter region is offloaded to that server.

        HBASE-3586 has been improved though HBASE-3609.
        The goal for this JIRA is to fulfill part of balancer's job at minimal cost.

        Show
        yuzhihong@gmail.com Ted Yu added a comment - Please refer to my blog http://zhihongyu.blogspot.com/2011/04/load-balancer-in-hbase-090.html One of the design guidelines I follow is to not introduce randomness where possible. The load is currently determined by the number of regions on the region server. There're plans for improving this measure toward weighted load based on load over time. Master runs balancer every "hbase.balancer.period" milliseconds, default 5 minutes. My patch latches onto the most recent run of balancer where the least loaded server was recorded. If ONE_ON_LEAST_LOADED policy is chosen, one daughter region is offloaded to that server. HBASE-3586 has been improved though HBASE-3609 . The goal for this JIRA is to fulfill part of balancer's job at minimal cost.
        Hide
        stack stack added a comment -

        So, as per J-D, opening daughters on same regionserver is an improvement over how things used work; it makes it so regions come back on line the faster (Previous, a region would split and we'd tell the master about the new daughters and it managed their – random – assignment; the involvement of this extra master agent would often result it longer outages across splits).

        Ted, I'll look at your patch in a second but 1., why doesn't your new balancing improvements take care of this case? The daughter regions are 'new' so they should be candidates for moving when the balancer next runs? And 2., it'd be grand if we could tend toward less config. if that is at all possible.

        Show
        stack stack added a comment - So, as per J-D, opening daughters on same regionserver is an improvement over how things used work; it makes it so regions come back on line the faster (Previous, a region would split and we'd tell the master about the new daughters and it managed their – random – assignment; the involvement of this extra master agent would often result it longer outages across splits). Ted, I'll look at your patch in a second but 1., why doesn't your new balancing improvements take care of this case? The daughter regions are 'new' so they should be candidates for moving when the balancer next runs? And 2., it'd be grand if we could tend toward less config. if that is at all possible.
        Hide
        stack stack added a comment -

        Needs tests to prove it does as advertised

        The HMaster change is cleanup of drudge left over from your last balancer improvement?

        Does this work?

        -    regionOnline(b, hsi);
        +    if (daughterPlacement == DaughterPlacement.SAME_HOST) {
        +      regionOnline(b, hsi);
        +    } else if (daughterPlacement == DaughterPlacement.ONE_ON_LEAST_LOADED) {
        +      if (leastLoadedServer != null) {
        +        LOG.info("placing " + b + " on " + leastLoadedServer);
        +        regionOnline(b, leastLoadedServer);
        +      }
        +      else regionOnline(b, hsi);
        +    }
        

        IIRC, this code is tickled when we get report of split. We are noting in the Master's memory that the daughters are up on host 'hsi'. In the above, if ONE_ON_LEAST_LOADED, you are saying the region is on the least loaded server. Writing this into Master memory state does not make it so; the region is still going to be out on the regionserver where the split happened? Right?

        Show
        stack stack added a comment - Needs tests to prove it does as advertised The HMaster change is cleanup of drudge left over from your last balancer improvement? Does this work? - regionOnline(b, hsi); + if (daughterPlacement == DaughterPlacement.SAME_HOST) { + regionOnline(b, hsi); + } else if (daughterPlacement == DaughterPlacement.ONE_ON_LEAST_LOADED) { + if (leastLoadedServer != null ) { + LOG.info( "placing " + b + " on " + leastLoadedServer); + regionOnline(b, leastLoadedServer); + } + else regionOnline(b, hsi); + } IIRC, this code is tickled when we get report of split. We are noting in the Master's memory that the daughters are up on host 'hsi'. In the above, if ONE_ON_LEAST_LOADED, you are saying the region is on the least loaded server. Writing this into Master memory state does not make it so; the region is still going to be out on the regionserver where the split happened? Right?
        Hide
        yuzhihong@gmail.com Ted Yu added a comment -

        The improvement from HBASE-3609 doesn't cover this JIRA in the following scenario:
        Region server A isn't overloaded in terms of number of regions on it. One of the regions of table B is actively written to and is split. Still, there is no guarantee that number of regions on A is above the sloppiness introduced by HBASE-3681.

        The introduction of new parameter is mostly due to different opinions on how the daughters should be handled.

        Show
        yuzhihong@gmail.com Ted Yu added a comment - The improvement from HBASE-3609 doesn't cover this JIRA in the following scenario: Region server A isn't overloaded in terms of number of regions on it. One of the regions of table B is actively written to and is split. Still, there is no guarantee that number of regions on A is above the sloppiness introduced by HBASE-3681 . The introduction of new parameter is mostly due to different opinions on how the daughters should be handled.
        Hide
        stack stack added a comment -

        OK. Thanks Ted. That makes sense. Patch doesn't work though, right? And regards config., I'd say, if you've come up w/ a means of figuring overloaded daughters post-split, then I'd say default would be to have it enabled rather than disabled.

        Show
        stack stack added a comment - OK. Thanks Ted. That makes sense. Patch doesn't work though, right? And regards config., I'd say, if you've come up w/ a means of figuring overloaded daughters post-split, then I'd say default would be to have it enabled rather than disabled.
        Hide
        yuzhihong@gmail.com Ted Yu added a comment -

        Second attempt.

        Show
        yuzhihong@gmail.com Ted Yu added a comment - Second attempt.
        Hide
        yuzhihong@gmail.com Ted Yu added a comment -

        I tested patch version 2 on our staging cluster. It works:

        2011-04-20 05:09:48,328 INFO org.apache.hadoop.hbase.master.AssignmentManager: placing REGION => {NAME => 'GRID-GRIDSQL-STAGING-THREEGPPSPEECHCALLS-1303255874782,69812FFB7A81D555FE61CA80130A81B5,1303276182555.79c4ba0610797cfdadec17c14a692a59.', STARTKEY => '69812FFB7A81D555FE61CA80130A81B5', ENDKEY => '6V\xBEp\xAA\xFF@\x0Fs\xE8\x91\x12\xB1zc\xB4\xB7\xF5\x87Y\xFC\x02}\xA1F\x8A\x97\x8D\xD5\x1F\xA7\xB8', ... on serverName=us01-ciqps1-grid07.carrieriq.com,60020,1303275849244, load=(requests=0, regions=325, usedHeap=23, maxHeap=3973) from serverName=us01-ciqps1-grid02.carrieriq.com,60020,1303275849713, load=(requests=0, regions=341, usedHeap=2118, maxHeap=3973)
        

        From table.jsp, I verified that the above region was indeed on grid07.

        Show
        yuzhihong@gmail.com Ted Yu added a comment - I tested patch version 2 on our staging cluster. It works: 2011-04-20 05:09:48,328 INFO org.apache.hadoop.hbase.master.AssignmentManager: placing REGION => {NAME => 'GRID-GRIDSQL-STAGING-THREEGPPSPEECHCALLS-1303255874782,69812FFB7A81D555FE61CA80130A81B5,1303276182555.79c4ba0610797cfdadec17c14a692a59.', STARTKEY => '69812FFB7A81D555FE61CA80130A81B5', ENDKEY => '6V\xBEp\xAA\xFF@\x0Fs\xE8\x91\x12\xB1zc\xB4\xB7\xF5\x87Y\xFC\x02}\xA1F\x8A\x97\x8D\xD5\x1F\xA7\xB8', ... on serverName=us01-ciqps1-grid07.carrieriq.com,60020,1303275849244, load=(requests=0, regions=325, usedHeap=23, maxHeap=3973) from serverName=us01-ciqps1-grid02.carrieriq.com,60020,1303275849713, load=(requests=0, regions=341, usedHeap=2118, maxHeap=3973) From table.jsp, I verified that the above region was indeed on grid07.
        Hide
        stack stack added a comment -

        I took a look at the patch. So, whats going to happen is that if we enable the move off of one daughter is that we'll split, open the regions on the parent regions' regionserver, then the master will be notified and we will immediately run the balancer and move the top half of the split (a close of a just opened region). Is this what we want really? The poor old clients will be doing a bunch of lookups in here to try and figure the new location; they will have to recalibrate for the new daughter region's location twice in a short amount of time. They could timeout before they find its new location?

        Show
        stack stack added a comment - I took a look at the patch. So, whats going to happen is that if we enable the move off of one daughter is that we'll split, open the regions on the parent regions' regionserver, then the master will be notified and we will immediately run the balancer and move the top half of the split (a close of a just opened region). Is this what we want really? The poor old clients will be doing a bunch of lookups in here to try and figure the new location; they will have to recalibrate for the new daughter region's location twice in a short amount of time. They could timeout before they find its new location?
        Hide
        yuzhihong@gmail.com Ted Yu added a comment -

        Regionserver splitter needs to be changed for more effective transition of daughter region off of parent region's region server.

        Show
        yuzhihong@gmail.com Ted Yu added a comment - Regionserver splitter needs to be changed for more effective transition of daughter region off of parent region's region server.
        Hide
        lhofhansl Lars Hofhansl added a comment -

        @Ted: Do you want to keep this open?

        Show
        lhofhansl Lars Hofhansl added a comment - @Ted: Do you want to keep this open?
        Hide
        yuzhihong@gmail.com Ted Yu added a comment -

        Please keep this open.
        We just need to find a good approach for this issue.

        Show
        yuzhihong@gmail.com Ted Yu added a comment - Please keep this open. We just need to find a good approach for this issue.
        Hide
        stack stack added a comment -

        Closing. This issue introduces a regression. We did a load of work to change splitting so both came up on same server improving MTTR. Why would we go back to the old system?

        Show
        stack stack added a comment - Closing. This issue introduces a regression. We did a load of work to change splitting so both came up on same server improving MTTR. Why would we go back to the old system?

          People

          • Assignee:
            yuzhihong@gmail.com Ted Yu
            Reporter:
            yuzhihong@gmail.com Ted Yu
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development