HBase
  1. HBase
  2. HBASE-5914

Bulk assign regions in the process of ServerShutdownHandler

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.95.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      In the process of ServerShutdownHandler, we currently assign regions singly.

      In the large cluster, one regionserver always carried many regions, this action is quite slow.

      What about using bulk assign regions like cluster start up.

      In current logic, if we failed assigning many regions to one destination server, we will wait unitl timeout,
      however in the process of ServerShutdownHandler, we should retry it to another server.

      1. HBASE-5914v3.patch
        6 kB
        chunhui shen
      2. HBASE-5914v2.patch
        5 kB
        chunhui shen
      3. HBASE-5914.patch
        5 kB
        chunhui shen

        Issue Links

          Activity

          Hide
          stack added a comment -

          Marking closed.

          Show
          stack added a comment - Marking closed.
          Hide
          Enis Soztutar added a comment -

          ramkrishna.s.vasudevan Ram, do you think that it is safe for backport into 0.94? It seems a good improvement for SSH.

          Show
          Enis Soztutar added a comment - ramkrishna.s.vasudevan Ram, do you think that it is safe for backport into 0.94? It seems a good improvement for SSH.
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK-security #197 (See https://builds.apache.org/job/HBase-TRUNK-security/197/)
          HBASE-5914 Bulk assign regions in the process of ServerShutdownHandler (Chunhui) (Revision 1336308)

          Result = FAILURE
          tedyu :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK-security #197 (See https://builds.apache.org/job/HBase-TRUNK-security/197/ ) HBASE-5914 Bulk assign regions in the process of ServerShutdownHandler (Chunhui) (Revision 1336308) Result = FAILURE tedyu : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #2858 (See https://builds.apache.org/job/HBase-TRUNK/2858/)
          HBASE-5914 Bulk assign regions in the process of ServerShutdownHandler (Chunhui) (Revision 1336308)

          Result = FAILURE
          tedyu :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #2858 (See https://builds.apache.org/job/HBase-TRUNK/2858/ ) HBASE-5914 Bulk assign regions in the process of ServerShutdownHandler (Chunhui) (Revision 1336308) Result = FAILURE tedyu : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
          Hide
          Ted Yu added a comment -

          Integrated to trunk.

          Thanks for the patch, Chunhui.

          Thanks for the review, Stack and Rama.

          Show
          Ted Yu added a comment - Integrated to trunk. Thanks for the patch, Chunhui. Thanks for the review, Stack and Rama.
          Hide
          stack added a comment -

          +1 on v3. Nice numbers in your test too Chunhui.

          Show
          stack added a comment - +1 on v3. Nice numbers in your test too Chunhui.
          Hide
          ramkrishna.s.vasudevan added a comment -

          I am +1 on this. But did not do any tests.

          Show
          ramkrishna.s.vasudevan added a comment - I am +1 on this. But did not do any tests.
          Hide
          Ted Yu added a comment -

          I was able to get TestDrainingServer pass on MacBook.

          Will integrate patch v3 if there is no objection.

          Show
          Ted Yu added a comment - I was able to get TestDrainingServer pass on MacBook. Will integrate patch v3 if there is no objection.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12526119/HBASE-5914v3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 hadoop23. The patch compiles against the hadoop 0.23.x profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.TestDrainingServer

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1811//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1811//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1811//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12526119/HBASE-5914v3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.TestDrainingServer Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1811//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1811//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1811//console This message is automatically generated.
          Hide
          chunhui shen added a comment -

          Patch v3 change the method name from quickAssign to assign.
          All failed tests are passed.

          Also, I have done a performance test.

          2012-05-09 14:29:45,364 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler:
          Recovery time calculation: reassigning 4000 regions, took 14847ms

          2012-05-09 14:38:00,695 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler:
          Recovery time calculation: reassigning 4001 regions, took 3242ms

          With the patch, the took time of reassigning 4000 regions in the process of SSH is reduced from 14s to 3s.

          Show
          chunhui shen added a comment - Patch v3 change the method name from quickAssign to assign. All failed tests are passed. Also, I have done a performance test. 2012-05-09 14:29:45,364 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Recovery time calculation: reassigning 4000 regions, took 14847ms 2012-05-09 14:38:00,695 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Recovery time calculation: reassigning 4001 regions, took 3242ms With the patch, the took time of reassigning 4000 regions in the process of SSH is reduced from 14s to 3s.
          Hide
          stack added a comment -

          Could you call the new method assign overloading existing assigns rather than quickAssign?

          Do you know why the above tests failed? Is it related?

          Show
          stack added a comment - Could you call the new method assign overloading existing assigns rather than quickAssign? Do you know why the above tests failed? Is it related?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12525380/HBASE-5914v2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 hadoop23. The patch compiles against the hadoop 0.23.x profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster
          org.apache.hadoop.hbase.TestDrainingServer
          org.apache.hadoop.hbase.master.TestAssignmentManager

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1778//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1778//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1778//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525380/HBASE-5914v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster org.apache.hadoop.hbase.TestDrainingServer org.apache.hadoop.hbase.master.TestAssignmentManager Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1778//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1778//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1778//console This message is automatically generated.
          Hide
          ramkrishna.s.vasudevan added a comment -

          Fine Ted. I agree.

          Show
          ramkrishna.s.vasudevan added a comment - Fine Ted. I agree.
          Hide
          Ted Yu added a comment -

          I think round robin is good enough for this issue.
          When a concrete non-round robin algorithm is developed, we can consider making that a choice.

          Show
          Ted Yu added a comment - I think round robin is good enough for this issue. When a concrete non-round robin algorithm is developed, we can consider making that a choice.
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Chunhui
          Yes, in SSH case retain assignment is not valid. But what if i don't need the roundrobin assignment itself to be called like how we have in master startup? I just meant a similar configuration like the 'hbase.master.startup.retainassign', for eg.,
          'hbase.master.servershutdown.roundrobinassign'something like this. Just my thoughts.

          Show
          ramkrishna.s.vasudevan added a comment - @Chunhui Yes, in SSH case retain assignment is not valid. But what if i don't need the roundrobin assignment itself to be called like how we have in master startup? I just meant a similar configuration like the 'hbase.master.startup.retainassign', for eg., 'hbase.master.servershutdown.roundrobinassign'something like this. Just my thoughts.
          Hide
          chunhui shen added a comment -

          how do we know that regions have been assigned ?

          We don't need wait until regions assigned, and quickAssign() just ensure the rpc of open region is sent successfully.
          For example, in current process of ServerShutdownHandler, it just call AssignmentManager#assign(region, true)(return void) and didn't wait

          Show
          chunhui shen added a comment - how do we know that regions have been assigned ? We don't need wait until regions assigned, and quickAssign() just ensure the rpc of open region is sent successfully. For example, in current process of ServerShutdownHandler, it just call AssignmentManager#assign(region, true)(return void) and didn't wait
          Hide
          Ted Yu added a comment -

          I think we needn't whether servers is empty, if it is empty, bulkPlan is null

          Is that so ?

          +      servers.removeAll(failedPlans.keySet());
          

          If servers become empty (failedPlans is not empty), we don't need to execute the following loop:

          +      for (Map.Entry<ServerName, List<HRegionInfo>> e : failedPlans.entrySet()) {
          

          Coming out of the top-most quickAssign() call, how do we know that regions have been assigned ?

          Show
          Ted Yu added a comment - I think we needn't whether servers is empty, if it is empty, bulkPlan is null Is that so ? + servers.removeAll(failedPlans.keySet()); If servers become empty (failedPlans is not empty), we don't need to execute the following loop: + for (Map.Entry<ServerName, List<HRegionInfo>> e : failedPlans.entrySet()) { Coming out of the top-most quickAssign() call, how do we know that regions have been assigned ?
          Hide
          chunhui shen added a comment -

          @ram
          In current process of ServerShutdownHandler, we don't wait until no regions in transition, and we could keep it.

          can we have a configuration like how we have in the master start up to ensure whether to use roundRobin or not.

          When master startup, we use the configuration of "hbase.master.startup.retainassign" to determine what type of assignment to do on startup,(retainassign or round-robin fashion), however retainassign is not appropriate in the process of ServerShutdownHandler since the old server is already be dead.

          Show
          chunhui shen added a comment - @ram In current process of ServerShutdownHandler, we don't wait until no regions in transition, and we could keep it. can we have a configuration like how we have in the master start up to ensure whether to use roundRobin or not. When master startup, we use the configuration of "hbase.master.startup.retainassign" to determine what type of assignment to do on startup,(retainassign or round-robin fashion), however retainassign is not appropriate in the process of ServerShutdownHandler since the old server is already be dead.
          Hide
          chunhui shen added a comment -

          @ted
          In the v2 patch, I remove

          +    if(regions.isEmpty()) return;

          ,so quickAssign must assign successfully now.

          +      servers.removeAll(failedPlans.keySet());

          I think we needn't whether servers is empty, if it is empty, bulkPlan is null, and we will assign region singly.

          Show
          chunhui shen added a comment - @ted In the v2 patch, I remove + if (regions.isEmpty()) return ; ,so quickAssign must assign successfully now. + servers.removeAll(failedPlans.keySet()); I think we needn't whether servers is empty, if it is empty, bulkPlan is null, and we will assign region singly.
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Chunhui
          Idea is good.
          But do we need

          boolean waitUntilNoRegionsInTransition(final long timeout)
          

          such wait mechanisms? Also can we have a configuration like how we have in the master start up to ensure whether to use roundRobin or not.
          Because if i have a custom load balancer i will get a call back in this case for round robin. May be we can avoid that if we have the configuration? What do you feel Chunhui?

          Show
          ramkrishna.s.vasudevan added a comment - @Chunhui Idea is good. But do we need boolean waitUntilNoRegionsInTransition( final long timeout) such wait mechanisms? Also can we have a configuration like how we have in the master start up to ensure whether to use roundRobin or not. Because if i have a custom load balancer i will get a call back in this case for round robin. May be we can avoid that if we have the configuration? What do you feel Chunhui?
          Hide
          Ted Yu added a comment -

          Good idea.

          +    if(regions.isEmpty()) return;
          

          Please insert space between if and (.

          +      servers.removeAll(failedPlans.keySet());
          

          I think we should check whether servers is empty before proceeding.
          I suggest letting quickAssign() return boolean to indicate whether 'quick' assignment is successful.

          Show
          Ted Yu added a comment - Good idea. + if (regions.isEmpty()) return ; Please insert space between if and (. + servers.removeAll(failedPlans.keySet()); I think we should check whether servers is empty before proceeding. I suggest letting quickAssign() return boolean to indicate whether 'quick' assignment is successful.
          Hide
          chunhui shen added a comment -

          In the HBASE-5914.patch,

          We create a method AssignmentManager#quickAssign:
          It first bulk assign regions with bulk plan to available servers,
          If some is failed, remove servers from available servers and retry

          Show
          chunhui shen added a comment - In the HBASE-5914 .patch, We create a method AssignmentManager#quickAssign: It first bulk assign regions with bulk plan to available servers, If some is failed, remove servers from available servers and retry

            People

            • Assignee:
              chunhui shen
              Reporter:
              chunhui shen
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development