Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0.0
    • Component/s: regionserver
    • Labels:
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      [ADVANCED USERS ONLY] This patch adds a new experimental module hbase-rsgroup. It is an advanced feature for partitioning regionservers into distinctive groups for strict isolation, and should only be used by users who are sophisticated enough to understand the full implications and have a sufficient background in managing HBase clusters.

      RSGroups can be defined and managed with shell commands or corresponding Java APIs. A server can be added to a group with hostname and port pair, and tables can be moved to this group so that only regionservers in the same rsgroup can host the regions of the table. RegionServers and tables can only belong to 1 group at a time. By default, all tables and regionservers belong to the "default" group. System tables can also be put into a group using the regular APIs. A custom balancer implementation tracks assignments per rsgroup and makes sure to move regions to the relevant regionservers in that group. The group information is stored in a regular HBase table, and a zookeeper-based read-only cache is used at the cluster bootstrap time.

      To enable, add the following to your hbase-site.xml and restart your Master:


       <property>
         <name>hbase.coprocessor.master.classes</name>
         <value>org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint</value>
       </property>
       <property>
         <name>hbase.master.loadbalancer.class</name>
         <value>org.apache.hadoop.hbase.rsgroup.RSGroupBasedLoadBalancer</value>
       </property>


      Then use the shell 'rsgroup' commands to create and manipulate regionserver groups: e.g. to add a group and then add a server to it, do as follows:

       hbase(main):008:0> add_rsgroup 'my_group'
       Took 0.5610 seconds

      This adds a group to the 'hbase:rsgroup' system table. Add a server (hostname + port) to the group using the 'move_rsgroup_servers' command as follows:

       hbase(main):010:0> move_rsgroup_servers 'my_group',['k.att.net:51129']
      Show
      [ADVANCED USERS ONLY] This patch adds a new experimental module hbase-rsgroup. It is an advanced feature for partitioning regionservers into distinctive groups for strict isolation, and should only be used by users who are sophisticated enough to understand the full implications and have a sufficient background in managing HBase clusters. RSGroups can be defined and managed with shell commands or corresponding Java APIs. A server can be added to a group with hostname and port pair, and tables can be moved to this group so that only regionservers in the same rsgroup can host the regions of the table. RegionServers and tables can only belong to 1 group at a time. By default, all tables and regionservers belong to the "default" group. System tables can also be put into a group using the regular APIs. A custom balancer implementation tracks assignments per rsgroup and makes sure to move regions to the relevant regionservers in that group. The group information is stored in a regular HBase table, and a zookeeper-based read-only cache is used at the cluster bootstrap time. To enable, add the following to your hbase-site.xml and restart your Master:  <property>    <name>hbase.coprocessor.master.classes</name>    <value>org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint</value>  </property>  <property>    <name>hbase.master.loadbalancer.class</name>    <value>org.apache.hadoop.hbase.rsgroup.RSGroupBasedLoadBalancer</value>  </property> Then use the shell 'rsgroup' commands to create and manipulate regionserver groups: e.g. to add a group and then add a server to it, do as follows:  hbase(main):008:0> add_rsgroup 'my_group'  Took 0.5610 seconds This adds a group to the 'hbase:rsgroup' system table. Add a server (hostname + port) to the group using the 'move_rsgroup_servers' command as follows:  hbase(main):010:0> move_rsgroup_servers 'my_group',['k.att.net:51129']

      Description

      In multi-tenant deployments of HBase, it is likely that a RegionServer will be serving out regions from a number of different tables owned by various client applications. Being able to group a subset of running RegionServers and assign specific tables to it, provides a client application a level of isolation and resource allocation.

      The proposal essentially is to have an AssignmentManager which is aware of RegionServer groups and assigns tables to region servers based on groupings. Load balancing will occur on a per group basis as well.

      This is essentially a simplification of the approach taken in HBASE-4120. See attached document.

      1. 6721-master-webUI.patch
        3 kB
        chunhui shen
      2. balanceCluster Sequence Diagram.svg
        81 kB
        Guorui Wu
      3. HBASE-6721_0.98_2.patch
        876 kB
        Nick Dimiduk
      4. HBASE-6721_10.patch
        781 kB
        Francis Liu
      5. HBASE-6721_11.patch
        942 kB
        Francis Liu
      6. HBASE-6721_12.patch
        877 kB
        Francis Liu
      7. HBASE-6721_13.patch
        826 kB
        Francis Liu
      8. HBASE-6721_14.patch
        812 kB
        Francis Liu
      9. HBASE-6721_15.patch
        838 kB
        Francis Liu
      10. HBASE-6721_8.patch
        778 kB
        Francis Liu
      11. HBASE-6721_9.patch
        779 kB
        Francis Liu
      12. HBASE-6721_9.patch
        775 kB
        Francis Liu
      13. HBASE-6721_94_2.patch
        150 kB
        Francis Liu
      14. HBASE-6721_94_3.patch
        131 kB
        Francis Liu
      15. HBASE-6721_94_3.patch
        352 kB
        Francis Liu
      16. HBASE-6721_94_4.patch
        180 kB
        Francis Liu
      17. HBASE-6721_94_5.patch
        184 kB
        Francis Liu
      18. HBASE-6721_94_6.patch
        196 kB
        Francis Liu
      19. HBASE-6721_94_7.patch
        201 kB
        Francis Liu
      20. HBASE-6721_94.patch
        162 kB
        Francis Liu
      21. HBASE-6721_94.patch
        117 kB
        Francis Liu
      22. HBASE-6721_98_1.patch
        941 kB
        Francis Liu
      23. HBASE-6721_98_2.patch
        876 kB
        Francis Liu
      24. HBASE-6721_hbase-6721_addendum.patch
        161 kB
        Francis Liu
      25. HBASE-6721_trunk.patch
        771 kB
        Francis Liu
      26. HBASE-6721_trunk.patch
        225 kB
        Francis Liu
      27. HBASE-6721_trunk.patch
        225 kB
        Francis Liu
      28. HBASE-6721_trunk1.patch
        820 kB
        Francis Liu
      29. HBASE-6721_trunk2.patch
        820 kB
        Francis Liu
      30. HBASE-6721-DesigDoc.pdf
        91 kB
        Francis Liu
      31. HBASE-6721-DesigDoc.pdf
        89 kB
        Francis Liu
      32. HBASE-6721-DesigDoc.pdf
        86 kB
        Francis Liu
      33. HBASE-6721-DesigDoc.pdf
        171 kB
        Vandana Ayyalasomayajula
      34. HBASE-6721 GroupBasedLoadBalancer Sequence Diagram.xml
        27 kB
        Guorui Wu
      35. hbase-6721-v15-branch-1.1.patch
        848 kB
        Enis Soztutar
      36. hbase-6721-v16.patch
        840 kB
        Enis Soztutar
      37. hbase-6721-v17.patch
        841 kB
        Enis Soztutar
      38. hbase-6721-v18.patch
        856 kB
        Francis Liu
      39. hbase-6721-v19.patch
        857 kB
        Francis Liu
      40. hbase-6721-v20.patch
        859 kB
        Francis Liu
      41. hbase-6721-v21.patch
        860 kB
        Francis Liu
      42. hbase-6721-v22.patch
        861 kB
        Francis Liu
      43. hbase-6721-v23.patch
        863 kB
        Francis Liu
      44. hbase-6721-v25.patch
        863 kB
        Francis Liu
      45. hbase-6721-v26_draft1.patch
        854 kB
        Francis Liu
      46. hbase-6721-v26.patch
        853 kB
        Francis Liu
      47. hbase-6721-v27.patch
        853 kB
        Enis Soztutar
      48. hbase-6721-v27.patch
        853 kB
        Francis Liu
      49. hbase-6721-v27.patch.txt
        853 kB
        Enis Soztutar
      50. hbase-6721-v28.patch
        852 kB
        Francis Liu
      51. hbase-6721-v28.patch
        852 kB
        Enis Soztutar
      52. hbase-6721-v29.patch
        852 kB
        Francis Liu
      53. immediateAssignments Sequence Diagram.svg
        94 kB
        Guorui Wu
      54. randomAssignment Sequence Diagram.svg
        90 kB
        Guorui Wu
      55. retainAssignment Sequence Diagram.svg
        150 kB
        Guorui Wu
      56. roundRobinAssignment Sequence Diagram.svg
        95 kB
        Guorui Wu

        Issue Links

          Activity

          Hide
          avandana Vandana Ayyalasomayajula added a comment -

          Design document for HBase region server grouping feature.

          Show
          avandana Vandana Ayyalasomayajula added a comment - Design document for HBase region server grouping feature.
          Hide
          yuzhihong@gmail.com Ted Yu added a comment -

          More details should be added to the design.
          Have you considered introducing interface for AssignmentManager so that existing and new managers can be easily swapped ?
          Have you considered storing group information in zookeeper instead of on hdfs ?

          Please explain more about RegionServerGroupProtocol.

          Thanks for the initiative.

          Show
          yuzhihong@gmail.com Ted Yu added a comment - More details should be added to the design. Have you considered introducing interface for AssignmentManager so that existing and new managers can be easily swapped ? Have you considered storing group information in zookeeper instead of on hdfs ? Please explain more about RegionServerGroupProtocol. Thanks for the initiative.
          Hide
          yuzhihong@gmail.com Ted Yu added a comment -

          Another aspect is fault tolerance.
          Say the smallest group consists of 6 region servers, the impact of majority of the 6 servers going down at the same time is much higher than 6 servers out of whole cluster going down where there is only one group.

          Show
          yuzhihong@gmail.com Ted Yu added a comment - Another aspect is fault tolerance. Say the smallest group consists of 6 region servers, the impact of majority of the 6 servers going down at the same time is much higher than 6 servers out of whole cluster going down where there is only one group.
          Hide
          toffer Francis Liu added a comment -

          > Have you considered introducing interface for AssignmentManager so that existing and new managers can be easily swapped?
          Yes, part of the proposal is to make AssignmentManager pluggable. I'll add that as a subtask for this.

          > Have you considered storing group information in zookeeper instead of on hdfs ?
          Correct me if I'm wrong, but it seems the approach HBase has taken for it's usage of ZK is more towards storing temporal data for coordination and the real source of truth is on HDFS or Tables. And we decided to follow the same approach.

          > Please explain more about RegionServerGroupProtocol.
          RegionServerGroupProtocol exposes APIs to manage Grouping (see API in doc). The currently plan is that these APIs will be used and exposed via the CLI commands.

          > Another aspect is fault tolerance.
          > Say the smallest group consists of 6 region servers, the impact of majority of the 6 servers going down at the same time is much higher than 6
          > servers out of whole cluster going down where there is only one group.
          This is similar to hbase cluster sizing for fault tolerance. Let's play around with it and later on document best practices.

          Show
          toffer Francis Liu added a comment - > Have you considered introducing interface for AssignmentManager so that existing and new managers can be easily swapped? Yes, part of the proposal is to make AssignmentManager pluggable. I'll add that as a subtask for this. > Have you considered storing group information in zookeeper instead of on hdfs ? Correct me if I'm wrong, but it seems the approach HBase has taken for it's usage of ZK is more towards storing temporal data for coordination and the real source of truth is on HDFS or Tables. And we decided to follow the same approach. > Please explain more about RegionServerGroupProtocol. RegionServerGroupProtocol exposes APIs to manage Grouping (see API in doc). The currently plan is that these APIs will be used and exposed via the CLI commands. > Another aspect is fault tolerance. > Say the smallest group consists of 6 region servers, the impact of majority of the 6 servers going down at the same time is much higher than 6 > servers out of whole cluster going down where there is only one group. This is similar to hbase cluster sizing for fault tolerance. Let's play around with it and later on document best practices.
          Hide
          yuzhihong@gmail.com Ted Yu added a comment -

          Looking at current doc, GroupInfo would be passed to (new) AssignmentManager.
          Do you plan to reference GroupInfo in AssignmentManager interface ?

          Show
          yuzhihong@gmail.com Ted Yu added a comment - Looking at current doc, GroupInfo would be passed to (new) AssignmentManager. Do you plan to reference GroupInfo in AssignmentManager interface ?
          Hide
          toffer Francis Liu added a comment -

          GroupInfo may be retrieved internally through an instance of the GroupInfoManager. This shouldn't leak through the interface. On a similar note Stack was suggesting something different to making it pluggable in HBASE-6723.

          Show
          toffer Francis Liu added a comment - GroupInfo may be retrieved internally through an instance of the GroupInfoManager. This shouldn't leak through the interface. On a similar note Stack was suggesting something different to making it pluggable in HBASE-6723 .
          Hide
          yuzhihong@gmail.com Ted Yu added a comment -

          through an instance of the GroupInfoManager

          If I understand correctly, GroupInfoManager singleton would be used inside new AssignmentManager.

          Show
          yuzhihong@gmail.com Ted Yu added a comment - through an instance of the GroupInfoManager If I understand correctly, GroupInfoManager singleton would be used inside new AssignmentManager.
          Hide
          yuzhihong@gmail.com Ted Yu added a comment -

          This is similar to hbase cluster sizing for fault tolerance.

          I think what I wanted to say was that we may need to reserve some region servers from the Default pool for emergency scenario so that each group can have decent size all the time.

          Show
          yuzhihong@gmail.com Ted Yu added a comment - This is similar to hbase cluster sizing for fault tolerance. I think what I wanted to say was that we may need to reserve some region servers from the Default pool for emergency scenario so that each group can have decent size all the time.
          Hide
          toffer Francis Liu added a comment -

          If I understand correctly, GroupInfoManager singleton would be used inside new AssignmentManager

          Yep it will create an instance of it internally.

          Show
          toffer Francis Liu added a comment - If I understand correctly, GroupInfoManager singleton would be used inside new AssignmentManager Yep it will create an instance of it internally.
          Hide
          toffer Francis Liu added a comment -

          I think what I wanted to say was that we may need to reserve some region servers from the Default pool for emergency scenario so that each group can have decent size all the time.

          I see, that could be something we could add later on. There are some scenarios we have to think more thoroughly about. A conservative group sizing and manual intervention should be good enough for now? Let's get the basics fully baked first.

          Show
          toffer Francis Liu added a comment - I think what I wanted to say was that we may need to reserve some region servers from the Default pool for emergency scenario so that each group can have decent size all the time. I see, that could be something we could add later on. There are some scenarios we have to think more thoroughly about. A conservative group sizing and manual intervention should be good enough for now? Let's get the basics fully baked first.
          Hide
          avandana Vandana Ayyalasomayajula added a comment -

          At the recent HBase contributor meet up, the topic of making the assignment manager pluggable was discussed. There was agreement that the assignment manager is a core and complex component of HBase, and making it pluggable is not a good idea. Alternatively, suggestions were made to introduce the region placement logic into the load balancer which is already pluggable. This idea works for this feature with making some changes to the LoadBalancer interface.

          I will update the design document with the changes I have mentioned.

          Show
          avandana Vandana Ayyalasomayajula added a comment - At the recent HBase contributor meet up, the topic of making the assignment manager pluggable was discussed. There was agreement that the assignment manager is a core and complex component of HBase, and making it pluggable is not a good idea. Alternatively, suggestions were made to introduce the region placement logic into the load balancer which is already pluggable. This idea works for this feature with making some changes to the LoadBalancer interface. I will update the design document with the changes I have mentioned.
          Hide
          stack stack added a comment -

          Lads, do you think this a 0.96 issue? You think it will be done in time? (W/i a couple of weeks or so)? If not, can we move it out? Thanks.

          Show
          stack stack added a comment - Lads, do you think this a 0.96 issue? You think it will be done in time? (W/i a couple of weeks or so)? If not, can we move it out? Thanks.
          Hide
          toffer Francis Liu added a comment -

          Stack, we'd like to get this into 0.96 (and if possible 0.94 as that's what we're planning on deploying). We should be have a usable patch in ~2 weeks. Does that sound acceptable?

          Show
          toffer Francis Liu added a comment - Stack, we'd like to get this into 0.96 (and if possible 0.94 as that's what we're planning on deploying). We should be have a usable patch in ~2 weeks. Does that sound acceptable?
          Hide
          stack stack added a comment -

          That should work.

          Show
          stack stack added a comment - That should work.
          Hide
          ram_krish ramkrishna.s.vasudevan added a comment -

          Will the group info persisted in the META table?

          Show
          ram_krish ramkrishna.s.vasudevan added a comment - Will the group info persisted in the META table?
          Hide
          toffer Francis Liu added a comment -

          Stack, great

          Ram, it'll be persisted in a file. Our thoughts were that the approach would be more invasive (and complex) if the information would be persisted in a table since we would have to know the group of a table before we can assign them to a region server and read it.

          Show
          toffer Francis Liu added a comment - Stack, great Ram, it'll be persisted in a file. Our thoughts were that the approach would be more invasive (and complex) if the information would be persisted in a table since we would have to know the group of a table before we can assign them to a region server and read it.
          Hide
          ram_krish ramkrishna.s.vasudevan added a comment -

          Ok, so if a region gets splitted then the daughter regions fall in the parents's group? So the group info file will be updated based on this info?
          Also groups cannot be updated once created right?

          Show
          ram_krish ramkrishna.s.vasudevan added a comment - Ok, so if a region gets splitted then the daughter regions fall in the parents's group? So the group info file will be updated based on this info? Also groups cannot be updated once created right?
          Hide
          toffer Francis Liu added a comment -

          Group membership is based on tables, for a split no update will be necessary on the file. Groups can be updated. Each update will be updated in memory and flushed to the file for persistence.

          Show
          toffer Francis Liu added a comment - Group membership is based on tables, for a split no update will be necessary on the file. Groups can be updated. Each update will be updated in memory and flushed to the file for persistence.
          Hide
          toffer Francis Liu added a comment -

          0.94 patch for initial review

          Show
          toffer Francis Liu added a comment - 0.94 patch for initial review
          Hide
          toffer Francis Liu added a comment -

          0.94 patch for initial review, We decided to combine the patch of two subtasks into one.

          Show
          toffer Francis Liu added a comment - 0.94 patch for initial review, We decided to combine the patch of two subtasks into one.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12550599/HBASE-6721_94.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3137//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12550599/HBASE-6721_94.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3137//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12550601/HBASE-6721_94.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 11 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3138//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12550601/HBASE-6721_94.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 11 new or modified tests. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3138//console This message is automatically generated.
          Hide
          yuzhihong@gmail.com Ted Yu added a comment -
          +  public LoadBalancer getBalancer() {
          +    return balancer;
          +  }
          

          The above method can be package private.

          +public class GroupAdminEndpoint extends BaseEndpointCoprocessor implements GroupAdminProtocol, EventHandler.EventHandlerListener {
          + private static final Log LOG = LogFactory.getLog(GroupAdminClient.class);
          

          Please add javadoc for the class. The line is beyond 100 characters.
          Log has wrong class.

          +  private ConcurrentMap<String,String> serversInTransition =
          

          What does the value in serversInTransition map represent ?

          +   List<HRegionInfo> regions = new ArrayList<HRegionInfo>();
          +   if (groupName == null) {
          +      throw new NullPointerException("groupName can't be null");
          

          nit: move ArrayList creation after the if statement.

          +  public Collection<String> listTablesOfGroup(String groupName) throws IOException {
          

          The return type is a collection, more generic than List that listOnlineRegionsOfGroup() returns. I guess there might be a reason.

          +      HTableDescriptor[] tables = master.getTableDescriptors().getAll().values().toArray(new HTableDescriptor[0]);
          

          nit: line too long.

          +  public GroupInfo getGroup(String groupName) throws IOException {
          

          Suggest renaming the method getGroupInfo(). getGroup() is kind of vague.

          More reviews to follow.

          Show
          yuzhihong@gmail.com Ted Yu added a comment - + public LoadBalancer getBalancer() { + return balancer; + } The above method can be package private. + public class GroupAdminEndpoint extends BaseEndpointCoprocessor implements GroupAdminProtocol, EventHandler.EventHandlerListener { + private static final Log LOG = LogFactory.getLog(GroupAdminClient.class); Please add javadoc for the class. The line is beyond 100 characters. Log has wrong class. + private ConcurrentMap< String , String > serversInTransition = What does the value in serversInTransition map represent ? + List<HRegionInfo> regions = new ArrayList<HRegionInfo>(); + if (groupName == null ) { + throw new NullPointerException( "groupName can't be null " ); nit: move ArrayList creation after the if statement. + public Collection< String > listTablesOfGroup( String groupName) throws IOException { The return type is a collection, more generic than List that listOnlineRegionsOfGroup() returns. I guess there might be a reason. + HTableDescriptor[] tables = master.getTableDescriptors().getAll().values().toArray( new HTableDescriptor[0]); nit: line too long. + public GroupInfo getGroup( String groupName) throws IOException { Suggest renaming the method getGroupInfo(). getGroup() is kind of vague. More reviews to follow.
          Hide
          yuzhihong@gmail.com Ted Yu added a comment -

          In GroupAdminEndpoint:

          +     throw new IOException(
          +         "The region server or the target to move found to be null.");
          

          It would be nice to point out which parameter is null.

          +        throw new DoNotRetryIOException("Group must have no associated tables.");
          

          Include group name in the exception message.

          +  public Map<String, String> listServersInTransition() throws IOException {
          

          Return type of Map includes additional information which is not used by callers. Suggest returning keySet.
          Down in GroupAdminClient:

          +      for(String server: proxy.listServersInTransition().keySet()) {
          +        found = found || servers.contains(server);
          +      }
          

          Can you tell me what the body is supposed to achieve ?
          Back to GroupAdminEndpoint:

          +  private GroupInfoManager getGroupInfoManager() {
          +    return ((GroupBasedLoadBalancer)menv.getMasterServices().getAssignmentManager().getBalancer()).getGroupInfoManager();
          

          Does GroupInfoManager belong to balancer ? The above is probably the longest indirection I have ever seen

          +  private List<HRegionInfo> getOnlineRegions(String hostPort) throws IOException {
          

          The above method is only called by listOnlineRegionsOfGroup() in a loop over online servers, resulting in nested loop.
          Please consider collapsing the nested loop into one loop.

          +      LOG.error("Failed to complete GroupMoveServer with of "+h.getPlan().getServers().size()+
          

          nit: remove ' of ' in above sentence.

          Show
          yuzhihong@gmail.com Ted Yu added a comment - In GroupAdminEndpoint: + throw new IOException( + "The region server or the target to move found to be null ." ); It would be nice to point out which parameter is null. + throw new DoNotRetryIOException( "Group must have no associated tables." ); Include group name in the exception message. + public Map< String , String > listServersInTransition() throws IOException { Return type of Map includes additional information which is not used by callers. Suggest returning keySet. Down in GroupAdminClient: + for ( String server: proxy.listServersInTransition().keySet()) { + found = found || servers.contains(server); + } Can you tell me what the body is supposed to achieve ? Back to GroupAdminEndpoint: + private GroupInfoManager getGroupInfoManager() { + return ((GroupBasedLoadBalancer)menv.getMasterServices().getAssignmentManager().getBalancer()).getGroupInfoManager(); Does GroupInfoManager belong to balancer ? The above is probably the longest indirection I have ever seen + private List<HRegionInfo> getOnlineRegions( String hostPort) throws IOException { The above method is only called by listOnlineRegionsOfGroup() in a loop over online servers, resulting in nested loop. Please consider collapsing the nested loop into one loop. + LOG.error( "Failed to complete GroupMoveServer with of " +h.getPlan().getServers().size()+ nit: remove ' of ' in above sentence.
          Hide
          toffer Francis Liu added a comment -

          Ted Yu

          What does the value in serversInTransition map represent?

          It represents servers that are being moved from one group to another.

          Can you tell me what the body is supposed to achieve ?
          Back to GroupAdminEndpoint:

          Retrieveing the balancer during start() returns null. Thus I have to retrieve it lazily as needed.

          Does GroupInfoManager belong to balancer ? The above is probably the longest indirection I have ever seen

          We had to do this since we didn't want to touch AssignmentManager as much as possible

          Show
          toffer Francis Liu added a comment - Ted Yu What does the value in serversInTransition map represent? It represents servers that are being moved from one group to another. Can you tell me what the body is supposed to achieve ? Back to GroupAdminEndpoint: Retrieveing the balancer during start() returns null. Thus I have to retrieve it lazily as needed. Does GroupInfoManager belong to balancer ? The above is probably the longest indirection I have ever seen We had to do this since we didn't want to touch AssignmentManager as much as possible
          Hide
          toffer Francis Liu added a comment -

          We had to do this since we didn't want to touch AssignmentManager as much as possible

          As an alternative, we can add a getBalancer() Method to MasterServices. Thoughts?

          Show
          toffer Francis Liu added a comment - We had to do this since we didn't want to touch AssignmentManager as much as possible As an alternative, we can add a getBalancer() Method to MasterServices. Thoughts?
          Hide
          yuzhihong@gmail.com Ted Yu added a comment -
          +  public void beforeProcess(EventHandler event) {
          

          I think normally the above should be called preProcess().

          +  public void afterProcess(EventHandler event) {
          

          Rename to postProcess().

          + * Copyright 2011 The Apache Software Foundation
          

          The above is no longer needed in license header.

          +public interface GroupAdminProtocol extends GroupAdmin, CoprocessorProtocol {
          +}
          

          I wasn't expecting a Protocol to not have methods in it

          +public class GroupBasedLoadBalancer implements LoadBalancer {
          

          Add javadoc for GroupBasedLoadBalancer.

          +      } catch (IOException e) {
          +        LOG.warn("IOException while creating GroupInfoManagerImpl.", e);
          +      }
          

          I think if groupManager cannot be initialized, we should abort master because group policy wouldn't be enforced.
          In correctAssignments():

          +        if ((info == null) || (!info.containsServer(sName.getHostAndPort()))) {
          +          // Misplaced region.
          +          misplacedRegions.add(region);
          

          Under what scenario would a region be misplaced at runtime ? I think rebalancing misplaced region(s) would affect normal operation of related groups.

          +    //unassign misplaced regions, so that they are assigned to correct groups.
          +    this.services.getAssignmentManager().unassign(misplacedRegions);
          
          Show
          yuzhihong@gmail.com Ted Yu added a comment - + public void beforeProcess(EventHandler event) { I think normally the above should be called preProcess(). + public void afterProcess(EventHandler event) { Rename to postProcess(). + * Copyright 2011 The Apache Software Foundation The above is no longer needed in license header. + public interface GroupAdminProtocol extends GroupAdmin, CoprocessorProtocol { +} I wasn't expecting a Protocol to not have methods in it + public class GroupBasedLoadBalancer implements LoadBalancer { Add javadoc for GroupBasedLoadBalancer. + } catch (IOException e) { + LOG.warn( "IOException while creating GroupInfoManagerImpl." , e); + } I think if groupManager cannot be initialized, we should abort master because group policy wouldn't be enforced. In correctAssignments(): + if ((info == null ) || (!info.containsServer(sName.getHostAndPort()))) { + // Misplaced region. + misplacedRegions.add(region); Under what scenario would a region be misplaced at runtime ? I think rebalancing misplaced region(s) would affect normal operation of related groups. + //unassign misplaced regions, so that they are assigned to correct groups. + this .services.getAssignmentManager().unassign(misplacedRegions);
          Hide
          yuzhihong@gmail.com Ted Yu added a comment -

          Can you tell me what the body is supposed to achieve ?

          I was asking about the following line of code:

          found = found || servers.contains(server);
          

          It seems to be condition checking.

          As an alternative, we can add a getBalancer() Method to MasterServices.

          That would be better than the current form.

          Show
          yuzhihong@gmail.com Ted Yu added a comment - Can you tell me what the body is supposed to achieve ? I was asking about the following line of code: found = found || servers.contains(server); It seems to be condition checking. As an alternative, we can add a getBalancer() Method to MasterServices. That would be better than the current form.
          Hide
          eclark Elliott Clark added a comment -

          Just some initial thoughts:

          • I couldn't seem to get it to compile for me on 0.94
          • There seem to be a bunch of formatting changes that aren't needed.
          • Passing in the preferred server into the load balancer on randomAssignment is messy. If we know the preferred server why call this function at all ?
          • The balancer is a public interface and we can't make changes to it in a minor release. And this patch won't apply to trunk.
          • With this many interfaces and classes it might make sense to move them into a namespace.
          • Why is GroupAdminClient in the master namespace and not in the client namespace.
          • Why a co-processor and not build it in ?
            • Security was done that was because it can be added or removed. As the patch is that's not really possible
            • This makes a lot of changes in core code for something that is a co-processor.
          • Don't create a DefaultLoadBalancer in GroupBasedLoadBalancer. The balancer was made pluggable and that feature shouldn't go away.
          • Why return ArrayListMultimap from groupRegions in GroupBasedLoadBalancer? Why not the base class
          • HTableDescriptor seems like the correct location for info about the table if you don't want to put that data into meta.
          • putting things into the filesystem seems like the wrong way to do it. There are just so many different moving parts with getting things from hdfs with caching and cache invalidation, and edge cases on failure.
          • There's a lot of logic about balancing bleeding into the AssignmentManager. Right now assignment manager is already too complex. I would much prefer a solution that had everything in the balancer.
          Show
          eclark Elliott Clark added a comment - Just some initial thoughts: I couldn't seem to get it to compile for me on 0.94 There seem to be a bunch of formatting changes that aren't needed. Passing in the preferred server into the load balancer on randomAssignment is messy. If we know the preferred server why call this function at all ? The balancer is a public interface and we can't make changes to it in a minor release. And this patch won't apply to trunk. With this many interfaces and classes it might make sense to move them into a namespace. Why is GroupAdminClient in the master namespace and not in the client namespace. Why a co-processor and not build it in ? Security was done that was because it can be added or removed. As the patch is that's not really possible This makes a lot of changes in core code for something that is a co-processor. Don't create a DefaultLoadBalancer in GroupBasedLoadBalancer. The balancer was made pluggable and that feature shouldn't go away. Why return ArrayListMultimap from groupRegions in GroupBasedLoadBalancer? Why not the base class HTableDescriptor seems like the correct location for info about the table if you don't want to put that data into meta. putting things into the filesystem seems like the wrong way to do it. There are just so many different moving parts with getting things from hdfs with caching and cache invalidation, and edge cases on failure. There's a lot of logic about balancing bleeding into the AssignmentManager. Right now assignment manager is already too complex. I would much prefer a solution that had everything in the balancer.
          Hide
          yuzhihong@gmail.com Ted Yu added a comment -

          There's a lot of logic about balancing bleeding into the AssignmentManager.

          Looking at the changes in AssignmentManager, they are mostly white space removal.
          There is only one real change:

          +  public LoadBalancer getBalancer() {
          +    return balancer;
          +  }
          

          which Francis agrees to move out.

          Show
          yuzhihong@gmail.com Ted Yu added a comment - There's a lot of logic about balancing bleeding into the AssignmentManager. Looking at the changes in AssignmentManager, they are mostly white space removal. There is only one real change: + public LoadBalancer getBalancer() { + return balancer; + } which Francis agrees to move out.
          Hide
          toffer Francis Liu added a comment -

          I was asking about the following line of code:

          found = found || servers.contains(server);

          It seems to be condition checking.

          Yeah it's basically checking if the list of servers to be moved is already in the ServersInTransition list, meaning it is already being moved so we shouldn't allow that.

          Show
          toffer Francis Liu added a comment - I was asking about the following line of code: found = found || servers.contains(server); It seems to be condition checking. Yeah it's basically checking if the list of servers to be moved is already in the ServersInTransition list, meaning it is already being moved so we shouldn't allow that.
          Hide
          eclark Elliott Clark added a comment -

          Looking at the changes in AssignmentManager, they are mostly white space removal.
          There is only one real change:

          You're missing the change where null plans are no longer queued which comes about because of this patch.

          Show
          eclark Elliott Clark added a comment - Looking at the changes in AssignmentManager, they are mostly white space removal. There is only one real change: You're missing the change where null plans are no longer queued which comes about because of this patch.
          Hide
          toffer Francis Liu added a comment -

          Thanks for the comments Elliott Clark

          I couldn't seem to get it to compile for me on 0.94

          Did you apply the patches attached in the subtasks prior to apply this patch? If you'd like them all in one single patch I can do that as well.

          There seem to be a bunch of formatting changes that aren't needed.

          Will clean that up in the next update.

          Passing in the preferred server into the load balancer on randomAssignment is messy. If we know the preferred server why call this function at all ?

          Good point, will remove that argument.

          The balancer is a public interface and we can't make changes to it in a minor release. And this patch won't apply to trunk.

          I see, we can make it binary compatible at least by supporting both interfaces if you're amenable to that. We're planning on getting 0.94 into production and it'd be great if we didn't have a lot of custom patches on top of it.

          With this many interfaces and classes it might make sense to move them into a namespace.

          Will look into doing this, my main concern is if there any dependencies to package private methods.

          Why a co-processor and not build it in ?
          Security was done that was because it can be added or removed. As the patch is that's not really possible
          This makes a lot of changes in core code for something that is a co-processor.

          As part of the design, HBase should run fine without the group based classes enabled (endpoint, balancer, etc). If it is not that case then that's a bug. As for some code changes in core code. Some may be unavoidable, but we could probably still make it less invasive (ie remove the EventHandler changes). Having said that, I don't mind if the community would like to have this fully integrated into HBase, just let us know.

          Don't create a DefaultLoadBalancer in GroupBasedLoadBalancer. The balancer was made pluggable and that feature shouldn't go away.

          The balancer is still pluggable it's just not pluggable for the GroupBasedLoadBalancer. Though should be ok to make that pluggable as well.

          HTableDescriptor seems like the correct location for info about the table if you don't want to put that data into meta.

          Yes, we have group affiliation store as a table property. Though group information is stored on hdfs.

          putting things into the filesystem seems like the wrong way to do it. There are just so many different moving parts with getting things from hdfs with caching and cache invalidation, and edge cases on failure.

          I see, were do you suggest we put it? Zookeeper? We mainly had it in HDFS since ZK, seemed to be the place to store only ephemeral data? Putting the data in tables would be a lot more complex and would require more core code change.

          Show
          toffer Francis Liu added a comment - Thanks for the comments Elliott Clark I couldn't seem to get it to compile for me on 0.94 Did you apply the patches attached in the subtasks prior to apply this patch? If you'd like them all in one single patch I can do that as well. There seem to be a bunch of formatting changes that aren't needed. Will clean that up in the next update. Passing in the preferred server into the load balancer on randomAssignment is messy. If we know the preferred server why call this function at all ? Good point, will remove that argument. The balancer is a public interface and we can't make changes to it in a minor release. And this patch won't apply to trunk. I see, we can make it binary compatible at least by supporting both interfaces if you're amenable to that. We're planning on getting 0.94 into production and it'd be great if we didn't have a lot of custom patches on top of it. With this many interfaces and classes it might make sense to move them into a namespace. Will look into doing this, my main concern is if there any dependencies to package private methods. Why a co-processor and not build it in ? Security was done that was because it can be added or removed. As the patch is that's not really possible This makes a lot of changes in core code for something that is a co-processor. As part of the design, HBase should run fine without the group based classes enabled (endpoint, balancer, etc). If it is not that case then that's a bug. As for some code changes in core code. Some may be unavoidable, but we could probably still make it less invasive (ie remove the EventHandler changes). Having said that, I don't mind if the community would like to have this fully integrated into HBase, just let us know. Don't create a DefaultLoadBalancer in GroupBasedLoadBalancer. The balancer was made pluggable and that feature shouldn't go away. The balancer is still pluggable it's just not pluggable for the GroupBasedLoadBalancer. Though should be ok to make that pluggable as well. HTableDescriptor seems like the correct location for info about the table if you don't want to put that data into meta. Yes, we have group affiliation store as a table property. Though group information is stored on hdfs. putting things into the filesystem seems like the wrong way to do it. There are just so many different moving parts with getting things from hdfs with caching and cache invalidation, and edge cases on failure. I see, were do you suggest we put it? Zookeeper? We mainly had it in HDFS since ZK, seemed to be the place to store only ephemeral data? Putting the data in tables would be a lot more complex and would require more core code change.
          Hide
          toffer Francis Liu added a comment -

          You're missing the change where null plans are no longer queued which comes about because of this patch.

          We needed this change to prevent regions from being assigned to region servers they don't belong to. We can continue to recognize null, we just need another way to prevent regions from being assigned to the wrong group of region servers. One option is to have a dead/bogus server as part of the plan if no online servers are available for a given group, this way it eventually gets reassigned once a live server is up. Would that work?

          Show
          toffer Francis Liu added a comment - You're missing the change where null plans are no longer queued which comes about because of this patch. We needed this change to prevent regions from being assigned to region servers they don't belong to. We can continue to recognize null, we just need another way to prevent regions from being assigned to the wrong group of region servers. One option is to have a dead/bogus server as part of the plan if no online servers are available for a given group, this way it eventually gets reassigned once a live server is up. Would that work?
          Hide
          eclark Elliott Clark added a comment -

          Did you apply the patches attached in the subtasks prior to apply this patch? If you'd like them all in one single patch I can do that as well.

          Nope I had missed those.

          As part of the design, HBase should run fine without the group based classes enabled (endpoint, balancer, etc).

          Then they should be a seperate module. This half in half removable part is just a little worrisome.

          The balancer is still pluggable it's just not pluggable for the GroupBasedLoadBalancer

          If this is going to be a feature then it will need to work with all balancers.

          I see, were do you suggest we put it? Zookeeper? We mainly had it in HDFS since ZK, seemed to be the place to store only ephemeral data? Putting the data in tables would be a lot more complex and would require more core code change.

          I would prefer something like security has. We have a perfectly usable key/value storage system no need to use a different storage path.

          Show
          eclark Elliott Clark added a comment - Did you apply the patches attached in the subtasks prior to apply this patch? If you'd like them all in one single patch I can do that as well. Nope I had missed those. As part of the design, HBase should run fine without the group based classes enabled (endpoint, balancer, etc). Then they should be a seperate module. This half in half removable part is just a little worrisome. The balancer is still pluggable it's just not pluggable for the GroupBasedLoadBalancer If this is going to be a feature then it will need to work with all balancers. I see, were do you suggest we put it? Zookeeper? We mainly had it in HDFS since ZK, seemed to be the place to store only ephemeral data? Putting the data in tables would be a lot more complex and would require more core code change. I would prefer something like security has. We have a perfectly usable key/value storage system no need to use a different storage path.
          Hide
          yuzhihong@gmail.com Ted Yu added a comment -

          Then they should be a seperate module.

          In 0.94, there is no support for modules.
          It would be nice to keep implementation roughly the same for 0.94 and 0.96. This way users can try this feature earlier and it is easier to maintain across major releases.

          Show
          yuzhihong@gmail.com Ted Yu added a comment - Then they should be a seperate module. In 0.94, there is no support for modules. It would be nice to keep implementation roughly the same for 0.94 and 0.96. This way users can try this feature earlier and it is easier to maintain across major releases.
          Hide
          toffer Francis Liu added a comment -

          I would prefer something like security has. We have a perfectly usable key/value storage system no need to use a different storage path.

          That was what we had originally in mind, but decided to punt on it due to the complexity. The main problem here is during startup before the catalog tables get assigned we would need to know what group the tables are members of. And since the "group" table isn't assigned yet we have to get creative. In my mind one way to do that is manually read the region on hdfs to retrieve the group information. Does that sound feasible?

          Show
          toffer Francis Liu added a comment - I would prefer something like security has. We have a perfectly usable key/value storage system no need to use a different storage path. That was what we had originally in mind, but decided to punt on it due to the complexity. The main problem here is during startup before the catalog tables get assigned we would need to know what group the tables are members of. And since the "group" table isn't assigned yet we have to get creative. In my mind one way to do that is manually read the region on hdfs to retrieve the group information. Does that sound feasible?
          Hide
          apurtell Andrew Purtell added a comment -

          And since the "group" table isn't assigned yet we have to get creative. In my mind one way to do that is manually read the region on hdfs to retrieve the group information.

          How do you handle META? Why not handle the "group" table the same way?

          Show
          apurtell Andrew Purtell added a comment - And since the "group" table isn't assigned yet we have to get creative. In my mind one way to do that is manually read the region on hdfs to retrieve the group information. How do you handle META? Why not handle the "group" table the same way?
          Hide
          toffer Francis Liu added a comment -

          In the current implementation we handle META the same way as all the other tables. Since we just load the group information from hdfs. If the "group" information was stored on a table we won't be able to access the group information through normal means until after we assign ROOT, META and the "group" table itself.

          Show
          toffer Francis Liu added a comment - In the current implementation we handle META the same way as all the other tables. Since we just load the group information from hdfs. If the "group" information was stored on a table we won't be able to access the group information through normal means until after we assign ROOT, META and the "group" table itself.
          Hide
          apurtell Andrew Purtell added a comment -

          If the "group" information was stored on a table we won't be able to access the group information through normal means until after we assign ROOT, META and the "group" table itself.

          Right, so you could assign ROOT, META, and the "group" table randomly at first, and then calculate placement and moves of the rest and perhaps also META and "group" after the "group" table becomes available, right? Worst case that would be 3 moves?

          Show
          apurtell Andrew Purtell added a comment - If the "group" information was stored on a table we won't be able to access the group information through normal means until after we assign ROOT, META and the "group" table itself. Right, so you could assign ROOT, META, and the "group" table randomly at first, and then calculate placement and moves of the rest and perhaps also META and "group" after the "group" table becomes available, right? Worst case that would be 3 moves?
          Hide
          apurtell Andrew Purtell added a comment - - edited

          The current attached patch for 0.94 makes a ton of changes in o.a.h.h.master.

          This stuff should all be pulled out into a separate package that can be put into a Maven module like done with security in 0.94.

          What changes to core code remain after that?

          Show
          apurtell Andrew Purtell added a comment - - edited The current attached patch for 0.94 makes a ton of changes in o.a.h.h.master. This stuff should all be pulled out into a separate package that can be put into a Maven module like done with security in 0.94. What changes to core code remain after that?
          Hide
          toffer Francis Liu added a comment -

          Updated patch addressed most of the comments. Mainly:

          • moved storage of group information from hdfs to zookeeper
          • cleaned up whitespace and removed unnecessary changes into core master classes
          Show
          toffer Francis Liu added a comment - Updated patch addressed most of the comments. Mainly: moved storage of group information from hdfs to zookeeper cleaned up whitespace and removed unnecessary changes into core master classes
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12552071/HBASE-5498_94_4.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 20 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3224//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12552071/HBASE-5498_94_4.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 20 new or modified tests. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3224//console This message is automatically generated.
          Hide
          toffer Francis Liu added a comment -

          This stuff should all be pulled out into a separate package that can be put into a Maven module like done with security in 0.94.

          Isn't that a bit too much overhead? Wasn't the reason security was a separate module was because it had to depend on a secure version of hadoop which the community didn't want?

          What changes to core code remain after that?

          I've updated the patch, there should be minimal and no invasive changes.

          Right, so you could assign ROOT, META, and the "group" table randomly at first, and then calculate placement and moves of the rest and perhaps also META and "group" after the "group" table becomes available, right? Worst case that would be 3 moves?

          Since we are not doing any changes to the assignment manager this actually is complicated since we won't really know what state assignment is in based on just calls to the load balancer and information we can glean through the master services.

          Show
          toffer Francis Liu added a comment - This stuff should all be pulled out into a separate package that can be put into a Maven module like done with security in 0.94. Isn't that a bit too much overhead? Wasn't the reason security was a separate module was because it had to depend on a secure version of hadoop which the community didn't want? What changes to core code remain after that? I've updated the patch, there should be minimal and no invasive changes. Right, so you could assign ROOT, META, and the "group" table randomly at first, and then calculate placement and moves of the rest and perhaps also META and "group" after the "group" table becomes available, right? Worst case that would be 3 moves? Since we are not doing any changes to the assignment manager this actually is complicated since we won't really know what state assignment is in based on just calls to the load balancer and information we can glean through the master services.
          Hide
          toffer Francis Liu added a comment -

          anyone know how to get a git patch onto review board? It seems it only works for trunk.

          Show
          toffer Francis Liu added a comment - anyone know how to get a git patch onto review board? It seems it only works for trunk.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12552073/HBASE-6721_94_2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 20 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3226//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12552073/HBASE-6721_94_2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 20 new or modified tests. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3226//console This message is automatically generated.
          Hide
          apurtell Andrew Purtell added a comment -

          Looks like the HBASE-6721_94_2 patch contains the patch on HBASE-7042 too?

          MasterExec and MasterExecResult are still in the HBASE-6721_94_2 patch, please use Exec and ExecResult and remove MasterExec and MasterExecResult and the change to HbaseObjectWritable.

          moved storage of group information from hdfs to zookeeper

          Nice, this moves forward in the right direction IMHO. However, I don't see where this information is persisted into a table. Did I miss it? Because HBase state in ZooKeeper can be cleared by administrators in some recovery scenarios, we can't consider it persistent storage. See how the security coprocessor handles this.

          This stuff should all be pulled out into a separate package that can be put into a Maven module like done with security in 0.94.

          Isn't that a bit too much overhead?

          The patch adds group-based assignment coprocessor specific classes to the main Master package and the main Client package. I should clarify that I didn't say your work must be put into a separate Maven module, but rather coprocessors are mini-applications, and should be organized as such. It is important to keep this set of changes, distinct optional functionality, grouped together apart from unrelated core code. Why not org.apache.hadoop.hbase.client.group and org.apache.hadoop.hbase.master.group etc.? I hope this feedback is clearer now.

          What changes to core code remain after that?

          I've updated the patch, there should be minimal and no invasive changes.

          Thanks! Much better.

          Still some possible whitepsace issues though, the patch may contain mixed tabs and spaces? Some of the formatting looks off when I view it. If you are using Eclipse, we have a HBase formatter in dev-support/ now that may be helpful.

          Show
          apurtell Andrew Purtell added a comment - Looks like the HBASE-6721 _94_2 patch contains the patch on HBASE-7042 too? MasterExec and MasterExecResult are still in the HBASE-6721 _94_2 patch, please use Exec and ExecResult and remove MasterExec and MasterExecResult and the change to HbaseObjectWritable. moved storage of group information from hdfs to zookeeper Nice, this moves forward in the right direction IMHO. However, I don't see where this information is persisted into a table. Did I miss it? Because HBase state in ZooKeeper can be cleared by administrators in some recovery scenarios, we can't consider it persistent storage. See how the security coprocessor handles this. This stuff should all be pulled out into a separate package that can be put into a Maven module like done with security in 0.94. Isn't that a bit too much overhead? The patch adds group-based assignment coprocessor specific classes to the main Master package and the main Client package. I should clarify that I didn't say your work must be put into a separate Maven module, but rather coprocessors are mini-applications, and should be organized as such. It is important to keep this set of changes, distinct optional functionality, grouped together apart from unrelated core code. Why not org.apache.hadoop.hbase.client.group and org.apache.hadoop.hbase.master.group etc.? I hope this feedback is clearer now. What changes to core code remain after that? I've updated the patch, there should be minimal and no invasive changes. Thanks! Much better. Still some possible whitepsace issues though, the patch may contain mixed tabs and spaces? Some of the formatting looks off when I view it. If you are using Eclipse, we have a HBase formatter in dev-support/ now that may be helpful.
          Hide
          apurtell Andrew Purtell added a comment - - edited

          Regarding storing the group assignment information in ZooKeeper, I think this is a good strategy. You can optimize for the case where HBase (re)starts with valid group assignment data available in ZK. Therefore you can avoid some bootstrapping challenges. However, you must persist the group assignment information into a table, like how security does with acl. After starting up, if in the uncommon case there is group assignment information in the group (or similar) table that is not in sync with that in ZK (perhaps because ZK state was cleared), you should update ZK data accordingly. From there, there are a couple of options:

          • WARN the administrator that the table assignments should be updated via disable and enable.
          • Automatically trigger reassignment via disable and enable.
          • Region moves (if the assignment information is available - trunk only)

          Edit: Fix incorrect use of markup.

          Show
          apurtell Andrew Purtell added a comment - - edited Regarding storing the group assignment information in ZooKeeper, I think this is a good strategy. You can optimize for the case where HBase (re)starts with valid group assignment data available in ZK. Therefore you can avoid some bootstrapping challenges. However, you must persist the group assignment information into a table, like how security does with acl . After starting up, if in the uncommon case there is group assignment information in the group (or similar) table that is not in sync with that in ZK (perhaps because ZK state was cleared), you should update ZK data accordingly. From there, there are a couple of options: WARN the administrator that the table assignments should be updated via disable and enable. Automatically trigger reassignment via disable and enable. Region moves (if the assignment information is available - trunk only) Edit: Fix incorrect use of markup.
          Hide
          yuzhihong@gmail.com Ted Yu added a comment -

          @Francis:
          On https://reviews.apache.org, you can select hbase-git repository and upload the patch for 0.94
          Review board is able to generate proper diff.

          + * Copyright 2011 The Apache Software Foundation
          

          The above line is no longer needed.

          Thanks

          Show
          yuzhihong@gmail.com Ted Yu added a comment - @Francis: On https://reviews.apache.org , you can select hbase-git repository and upload the patch for 0.94 Review board is able to generate proper diff. + * Copyright 2011 The Apache Software Foundation The above line is no longer needed. Thanks
          Hide
          lhofhansl Lars Hofhansl added a comment -

          Can this also colocate regions of different tables, maybe based on a key prefix?
          In our case we'd tenants share tables, and each row key would be a prefixed by a tenant id.

          Show
          lhofhansl Lars Hofhansl added a comment - Can this also colocate regions of different tables, maybe based on a key prefix? In our case we'd tenants share tables, and each row key would be a prefixed by a tenant id.
          Hide
          yuzhihong@gmail.com Ted Yu added a comment -

          In SecureGroupAdminEndpoint, the following code is called repeatedly:

          +  private AccessController getAccessController() {
          +    return (AccessController)menv.getMasterServices()
          +        .getCoprocessorHost().findCoprocessor(AccessController.class.getName());
          +  }
          

          Should AccessController be stored in a field variable ?
          For TestSecureGroupAdminEndpoint:

          +  @Test
          +  public void testGetAddRemove() throws Exception {
          

          Would testGroupGetterAdditionRemoval() be better name for the above method ?

          For GroupAdmin interface:
          
          +   * List servers that are currently being moved to a new group
          +   * @return
          +   * @throws IOException
          +   */
          +  Map<String, String> listServersInTransition() throws IOException;
          

          Please describe the meaning of key and value in the returned Map.
          For GroupAdminClient.java:

          +      for(String server: proxy.listServersInTransition().keySet()) {
          +        found = found || servers.contains(server);
          +      }
          +      Thread.sleep(1000);
          

          We can break out of the for loop once found becomes true, right ?
          Would suggest using a shorter sleep interval above.
          For DefaultLoadBalancer.java:

          -  public ServerName randomAssignment(List<ServerName> servers) {
          +  public ServerName randomAssignment(HRegionInfo region, List<ServerName> servers) {
          

          The new parameter region is not used. We don't need to introduce this parameter, right ?
          BTW StochasticLoadBalancer doesn't have randomAssignment().

          Will provide more comments for next patch.

          Show
          yuzhihong@gmail.com Ted Yu added a comment - In SecureGroupAdminEndpoint, the following code is called repeatedly: + private AccessController getAccessController() { + return (AccessController)menv.getMasterServices() + .getCoprocessorHost().findCoprocessor(AccessController.class.getName()); + } Should AccessController be stored in a field variable ? For TestSecureGroupAdminEndpoint: + @Test + public void testGetAddRemove() throws Exception { Would testGroupGetterAdditionRemoval() be better name for the above method ? For GroupAdmin interface : + * List servers that are currently being moved to a new group + * @ return + * @ throws IOException + */ + Map< String , String > listServersInTransition() throws IOException; Please describe the meaning of key and value in the returned Map. For GroupAdminClient.java: + for ( String server: proxy.listServersInTransition().keySet()) { + found = found || servers.contains(server); + } + Thread .sleep(1000); We can break out of the for loop once found becomes true, right ? Would suggest using a shorter sleep interval above. For DefaultLoadBalancer.java: - public ServerName randomAssignment(List<ServerName> servers) { + public ServerName randomAssignment(HRegionInfo region, List<ServerName> servers) { The new parameter region is not used. We don't need to introduce this parameter, right ? BTW StochasticLoadBalancer doesn't have randomAssignment(). Will provide more comments for next patch.
          Hide
          avandana Vandana Ayyalasomayajula added a comment -

          Ted Yu In your previous comment, the extra parameter (HRegionInfo) is used by the GroupBasedLoadBalancer class to determine the group information.

          Show
          avandana Vandana Ayyalasomayajula added a comment - Ted Yu In your previous comment, the extra parameter (HRegionInfo) is used by the GroupBasedLoadBalancer class to determine the group information.
          Hide
          toffer Francis Liu added a comment -

          Still some possible whitepsace issues though, the patch may contain mixed tabs and spaces? Some of the formatting looks off when I view it. If you are using Eclipse, we have a HBase formatter in dev-support/ now that may be helpful.

          I'm using intellij and it doesn't seem to like the trailing whitespaces in the original file. I couldn't figure out a way to turn it off.

          re: moving into another package - I see that sounds reasonable. Will do that but I will have to make some package private methods in assignment manager public.

          Show
          toffer Francis Liu added a comment - Still some possible whitepsace issues though, the patch may contain mixed tabs and spaces? Some of the formatting looks off when I view it. If you are using Eclipse, we have a HBase formatter in dev-support/ now that may be helpful. I'm using intellij and it doesn't seem to like the trailing whitespaces in the original file. I couldn't figure out a way to turn it off. re: moving into another package - I see that sounds reasonable. Will do that but I will have to make some package private methods in assignment manager public.
          Hide
          toffer Francis Liu added a comment -

          BTW StochasticLoadBalancer doesn't have randomAssignment().

          randomAssignment is part of the original interface StochasticLoadBalancer should have it. Also I don't see it that balancer in 0.94?

          Show
          toffer Francis Liu added a comment - BTW StochasticLoadBalancer doesn't have randomAssignment(). randomAssignment is part of the original interface StochasticLoadBalancer should have it. Also I don't see it that balancer in 0.94?
          Hide
          toffer Francis Liu added a comment -

          Can this also colocate regions of different tables, maybe based on a key prefix?
          In our case we'd tenants share tables, and each row key would be a prefixed by a tenant id.

          Currently it determines group membership based on a table property. We can make things more generic and have membership determined from region metadata (ie startKey). Would that work for you? Would each tenant become a group? Or can multiple tenants be part of the same group?

          Show
          toffer Francis Liu added a comment - Can this also colocate regions of different tables, maybe based on a key prefix? In our case we'd tenants share tables, and each row key would be a prefixed by a tenant id. Currently it determines group membership based on a table property. We can make things more generic and have membership determined from region metadata (ie startKey). Would that work for you? Would each tenant become a group? Or can multiple tenants be part of the same group?
          Hide
          apurtell Andrew Purtell added a comment -

          re: moving into another package - I see that sounds reasonable. Will do that but I will have to make some package private methods in assignment manager public.

          I don't see that as a problem if you could also kindly tag them with private interface annotations.

          Show
          apurtell Andrew Purtell added a comment - re: moving into another package - I see that sounds reasonable. Will do that but I will have to make some package private methods in assignment manager public. I don't see that as a problem if you could also kindly tag them with private interface annotations.
          Hide
          lhofhansl Lars Hofhansl added a comment -

          Currently it determines group membership based on a table property. We can make things more generic and have membership determined from region metadata (ie startKey). Would that work for you? Would each tenant become a group? Or can multiple tenants be part of the same group?

          That should work. I would need to think this through in a bit more detail. Might be a bit tricky, since many small tenants could be in the same region, in which case colocation might not be possible at all.

          Show
          lhofhansl Lars Hofhansl added a comment - Currently it determines group membership based on a table property. We can make things more generic and have membership determined from region metadata (ie startKey). Would that work for you? Would each tenant become a group? Or can multiple tenants be part of the same group? That should work. I would need to think this through in a bit more detail. Might be a bit tricky, since many small tenants could be in the same region, in which case colocation might not be possible at all.
          Hide
          yuzhihong@gmail.com Ted Yu added a comment -

          In trunk, BaseLoadBalancer has:

            public ServerName randomAssignment(HRegionInfo regionInfo, List<ServerName> servers) {
          

          StochasticLoadBalancer extends BaseLoadBalancer.
          so the addition of HRegionInfo region parameter in 0.94 should be fine.

          Show
          yuzhihong@gmail.com Ted Yu added a comment - In trunk, BaseLoadBalancer has: public ServerName randomAssignment(HRegionInfo regionInfo, List<ServerName> servers) { StochasticLoadBalancer extends BaseLoadBalancer. so the addition of HRegionInfo region parameter in 0.94 should be fine.
          Hide
          toffer Francis Liu added a comment -

          First stab at using a table to store the group information. GroupBasedLoadBalancer now works in two modes online and offline. In offline, balance doesn't work and the rest does random assignment but only for the catalog table and group table the rest are given a null/bogus assignment. Random assignment will then need to be corrected with a call to balance() or just let the chore call balance eventually. The implementation is a bit clunky but probably the prolly the best choice if we want to keep the data in the tables. I've tested this on a distributed cluster and it seems to work. Let me know if this is ok so I can continue working on this and addressing the comments. Please call out other major concerns that I should be addressing. I'm hoping we can start working on a trunk patch soon .

          Show
          toffer Francis Liu added a comment - First stab at using a table to store the group information. GroupBasedLoadBalancer now works in two modes online and offline. In offline, balance doesn't work and the rest does random assignment but only for the catalog table and group table the rest are given a null/bogus assignment. Random assignment will then need to be corrected with a call to balance() or just let the chore call balance eventually. The implementation is a bit clunky but probably the prolly the best choice if we want to keep the data in the tables. I've tested this on a distributed cluster and it seems to work. Let me know if this is ok so I can continue working on this and addressing the comments. Please call out other major concerns that I should be addressing. I'm hoping we can start working on a trunk patch soon .
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12552446/HBASE-6721_94_3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 71 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3249//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12552446/HBASE-6721_94_3.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 71 new or modified tests. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3249//console This message is automatically generated.
          Hide
          apurtell Andrew Purtell added a comment - - edited

          The implementation is a bit clunky but probably the prolly the best choice if we want to keep the data in the tables.

          Francis Liu Keeping the group assignment data in a table (at least the authoritative record of it) seems appropriate and consistent with feedback. Have you also considered adding a state mirror in ZK to avoid the need for random assignment of catalog tables and the group table if it is available on (re)start?

          FYI, looks like the _3 patch picks up unrelated work.

          Edit: See my above recent comment in this regard.

          Show
          apurtell Andrew Purtell added a comment - - edited The implementation is a bit clunky but probably the prolly the best choice if we want to keep the data in the tables. Francis Liu Keeping the group assignment data in a table (at least the authoritative record of it) seems appropriate and consistent with feedback. Have you also considered adding a state mirror in ZK to avoid the need for random assignment of catalog tables and the group table if it is available on (re)start? FYI, looks like the _3 patch picks up unrelated work. Edit: See my above recent comment in this regard.
          Hide
          yuzhihong@gmail.com Ted Yu added a comment -

          @Francis:
          Please refresh your 0.94 workspace. Some changes in patch v3 were from HBASE-6796

          Show
          yuzhihong@gmail.com Ted Yu added a comment - @Francis: Please refresh your 0.94 workspace. Some changes in patch v3 were from HBASE-6796
          Hide
          yuzhihong@gmail.com Ted Yu added a comment -
          +  public GroupMoveServerWorker(Server master, Map<String, String> serversInTransition,
          ...
          +    this.master = (MasterServices)master;
          

          master parameter should be declared as MasterServices.

          +        if(sourceGroup != null && !tmpGroup.equals(sourceGroup)) {
          +          throw new DoNotRetryIOException("Move server request should only come from one source group");
          

          Consider including sourceGroup and tmpGroup in the exception message.

          +      if(!sourceGroup.startsWith(GroupInfo.TRANSITION_GROUP_PREFIX)) {
          +        transGroup = GroupInfo.TRANSITION_GROUP_PREFIX+
          +            System.currentTimeMillis()+"_"+sourceGroup+"-"+plan.getTargetGroup();
          +        groupManager.addGroup(new GroupInfo(transGroup, new TreeSet<String>()));
          +      }
          +    }
          +    groupManager.moveServers(plan.getServers(), sourceGroup, transGroup!=null?transGroup:plan.getTargetGroup());
          

          Why moving the servers to a transitional group ?

          +        } catch (InterruptedException e) {
          +          LOG.warn("Sleep interrupted", e);
          

          Restore interrupt status.
          In complete():

          +      if(success) {
          +        groupManager.moveServers(plan.getServers(), tmpSourceGroup, plan.getTargetGroup());
          +        if(transGroup != null) {
          +          groupManager.removeGroup(transGroup);
          

          If early action in run() wasn't successful, the transitional group would hang around ?

          Show
          yuzhihong@gmail.com Ted Yu added a comment - + public GroupMoveServerWorker(Server master, Map< String , String > serversInTransition, ... + this .master = (MasterServices)master; master parameter should be declared as MasterServices. + if (sourceGroup != null && !tmpGroup.equals(sourceGroup)) { + throw new DoNotRetryIOException( "Move server request should only come from one source group" ); Consider including sourceGroup and tmpGroup in the exception message. + if (!sourceGroup.startsWith(GroupInfo.TRANSITION_GROUP_PREFIX)) { + transGroup = GroupInfo.TRANSITION_GROUP_PREFIX+ + System .currentTimeMillis()+ "_" +sourceGroup+ "-" +plan.getTargetGroup(); + groupManager.addGroup( new GroupInfo(transGroup, new TreeSet< String >())); + } + } + groupManager.moveServers(plan.getServers(), sourceGroup, transGroup!= null ?transGroup:plan.getTargetGroup()); Why moving the servers to a transitional group ? + } catch (InterruptedException e) { + LOG.warn( "Sleep interrupted" , e); Restore interrupt status. In complete(): + if (success) { + groupManager.moveServers(plan.getServers(), tmpSourceGroup, plan.getTargetGroup()); + if (transGroup != null ) { + groupManager.removeGroup(transGroup); If early action in run() wasn't successful, the transitional group would hang around ?
          Hide
          yuzhihong@gmail.com Ted Yu added a comment -

          In GroupInfoManager:

          +  boolean moveServers(Set<String> hostPort, String srcGroup, String dstGroup) throws IOException;
          

          Please add javadoc for moveServers() method.
          For GroupInfoManagerImpl , indentation should be 2 spaces.
          For moveServers:

          +    for(String el: hostPort) {
          +      foundOne = foundOne || src.removeServer(el);
          +      dst.addServer(el);
          

          If foundOne is true in the middle of the loop, would the call to src.removeServer(el) be skipped ? That would lead to server being in both groups.

          Show
          yuzhihong@gmail.com Ted Yu added a comment - In GroupInfoManager: + boolean moveServers(Set< String > hostPort, String srcGroup, String dstGroup) throws IOException; Please add javadoc for moveServers() method. For GroupInfoManagerImpl , indentation should be 2 spaces. For moveServers: + for ( String el: hostPort) { + foundOne = foundOne || src.removeServer(el); + dst.addServer(el); If foundOne is true in the middle of the loop, would the call to src.removeServer(el) be skipped ? That would lead to server being in both groups.
          Hide
          toffer Francis Liu added a comment -

          clean patch, sorry for the confusion

          Show
          toffer Francis Liu added a comment - clean patch, sorry for the confusion
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12552520/HBASE-6721_94_3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 14 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3257//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12552520/HBASE-6721_94_3.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 14 new or modified tests. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3257//console This message is automatically generated.
          Hide
          toffer Francis Liu added a comment -

          Have you also considered adding a state mirror in ZK to avoid the need for random assignment of catalog tables and the group table if it is available on (re)start?

          I'm wondering if it's worth the added complexity it introduces. For security it was needed since consumers of acl data was distributed and zk could do the orchestration. But in this case the data is only read and written to by the active master which to me makes mirroring things in ZK more of an enhancement. I don't feel strongly about it but I'm more inclined to keep things simple for now and see how things go. Thoughts?

          Show
          toffer Francis Liu added a comment - Have you also considered adding a state mirror in ZK to avoid the need for random assignment of catalog tables and the group table if it is available on (re)start? I'm wondering if it's worth the added complexity it introduces. For security it was needed since consumers of acl data was distributed and zk could do the orchestration. But in this case the data is only read and written to by the active master which to me makes mirroring things in ZK more of an enhancement. I don't feel strongly about it but I'm more inclined to keep things simple for now and see how things go. Thoughts?
          Hide
          toffer Francis Liu added a comment -

          Have you also considered adding a state mirror in ZK to avoid the need for random assignment of catalog tables and the group table if it is available on (re)start?

          Another approach is to mirror only the catalog and group table assignment information in ZK. This would add less complexity and minimizes the cost of having inconsistent data between the two stores.

          Show
          toffer Francis Liu added a comment - Have you also considered adding a state mirror in ZK to avoid the need for random assignment of catalog tables and the group table if it is available on (re)start? Another approach is to mirror only the catalog and group table assignment information in ZK. This would add less complexity and minimizes the cost of having inconsistent data between the two stores.
          Hide
          jxiang Jimmy Xiang added a comment -

          Do you have a patch for 0.96 on review board?

          Show
          jxiang Jimmy Xiang added a comment - Do you have a patch for 0.96 on review board?
          Hide
          avandana Vandana Ayyalasomayajula added a comment -

          Jimmy Xiang – We have not started to work on the patch for trunk as we wanted the patch for branch-94 to address all the review comments. Hopefully after one more rounds of review, we will start working on the patch for the trunk.

          Show
          avandana Vandana Ayyalasomayajula added a comment - Jimmy Xiang – We have not started to work on the patch for trunk as we wanted the patch for branch-94 to address all the review comments. Hopefully after one more rounds of review, we will start working on the patch for the trunk.
          Hide
          zhihyu@ebaysf.com Ted Yu added a comment -

          In GroupBasedLoadBalancer.roundRobinAssignment():

          +        for (HRegionInfo region : regions) {
          ...
          +          assignments.put(ServerName.parseServerName("127.0.0.1:1"),regions);
          

          Should regions for ROOT, .META. and group tables be excluded from the above put() ? These regions are handled earlier by the call to assignments.putAll().
          Should we confine these special tables to default group ?

          Show
          zhihyu@ebaysf.com Ted Yu added a comment - In GroupBasedLoadBalancer.roundRobinAssignment(): + for (HRegionInfo region : regions) { ... + assignments.put(ServerName.parseServerName( "127.0.0.1:1" ),regions); Should regions for ROOT, .META. and group tables be excluded from the above put() ? These regions are handled earlier by the call to assignments.putAll(). Should we confine these special tables to default group ?
          Hide
          yuzhihong@gmail.com Ted Yu added a comment -

          I went over GroupBasedLoadBalancer.
          In correctAssignments():

          +        }catch(IOException exp){
          +          LOG.debug("Group information null for region of table " + region.getTableNameAsString(),
          +              exp);
          +        }
          +        if ((info == null) || (!info.containsServer(sName.getHostAndPort()))) {
          

          If groupManager.getGroup() fails to return group information, it is likely the exception would occur multiple times inside the loop. We should distinguish this from the case of !info.containsServer(sName.getHostAndPort()). Meaning, in case of exception, we don't conclude that the regions are misplaced.

          In roundRobinAssignment():

          +              regionGroup.get(groupKey), getServerToAssign(info, servers)));
          

          Rename getServerToAssign() -> getServersToAssign()

          +          assignments.put(ServerName.parseServerName("127.0.0.1:1"),regions);
          

          The above statement would be executed many times. Consider creating a singleton for ServerName.parseServerName("127.0.0.1:1").

          +      throw new IllegalStateException("Failed to access group store", e);
          

          The above would cause master to exit, right ? If you don't want to change the signature of roundRobinAssignment(), consider returning null.

          +        if(nulled.size() > 0) {
          

          style: insert space between if and (.

          offlineRetainAssignment() contains only one method call. Consider merging it into retainAssignment().

          +    //chances are most are not, then we just use balance to correct
          

          'are not': not correct or not incorrect ? Please clarify. It would be nice to end each line of comment with a period.

          +      ListMultimap<String, HRegionInfo> rGroup = ArrayListMultimap.create();
          

          Would groupToRegionMap be better name for the above variable ?

          Show
          yuzhihong@gmail.com Ted Yu added a comment - I went over GroupBasedLoadBalancer. In correctAssignments(): + } catch (IOException exp){ + LOG.debug( "Group information null for region of table " + region.getTableNameAsString(), + exp); + } + if ((info == null ) || (!info.containsServer(sName.getHostAndPort()))) { If groupManager.getGroup() fails to return group information, it is likely the exception would occur multiple times inside the loop. We should distinguish this from the case of !info.containsServer(sName.getHostAndPort()). Meaning, in case of exception, we don't conclude that the regions are misplaced. In roundRobinAssignment(): + regionGroup.get(groupKey), getServerToAssign(info, servers))); Rename getServerToAssign() -> getServersToAssign() + assignments.put(ServerName.parseServerName( "127.0.0.1:1" ),regions); The above statement would be executed many times. Consider creating a singleton for ServerName.parseServerName("127.0.0.1:1"). + throw new IllegalStateException( "Failed to access group store" , e); The above would cause master to exit, right ? If you don't want to change the signature of roundRobinAssignment(), consider returning null. + if (nulled.size() > 0) { style: insert space between if and (. offlineRetainAssignment() contains only one method call. Consider merging it into retainAssignment(). + //chances are most are not, then we just use balance to correct 'are not': not correct or not incorrect ? Please clarify. It would be nice to end each line of comment with a period. + ListMultimap< String , HRegionInfo> rGroup = ArrayListMultimap.create(); Would groupToRegionMap be better name for the above variable ?
          Hide
          yuzhihong@gmail.com Ted Yu added a comment -

          More comments about GroupBasedLoadBalancer.
          In randomAssignment():

          +      } else if(SPECIAL_TABLES.contains(region.getTableName())){
          +        candidateList = servers;
          

          If you limit special tables to default group, there is no need to move them after system becomes online.
          For GroupStartupWorker, isOnline is set to true after waitForGroupTableOnline() returns. In waitForGroupTableOnline():

          +          if(found.get()) {
          +            ((GroupBasedLoadBalancer)masterServices.getLoadBalancer()).getGroupInfoManager();
          +          }
          

          Would the above call be effective (isOnline is false at this point) ?

          Show
          yuzhihong@gmail.com Ted Yu added a comment - More comments about GroupBasedLoadBalancer. In randomAssignment(): + } else if (SPECIAL_TABLES.contains(region.getTableName())){ + candidateList = servers; If you limit special tables to default group, there is no need to move them after system becomes online. For GroupStartupWorker, isOnline is set to true after waitForGroupTableOnline() returns. In waitForGroupTableOnline(): + if (found.get()) { + ((GroupBasedLoadBalancer)masterServices.getLoadBalancer()).getGroupInfoManager(); + } Would the above call be effective (isOnline is false at this point) ?
          Hide
          toffer Francis Liu added a comment -

          Should we confine these special tables to default group ?

          That's a good point, I just realized that it's not possible to add table properties to the catalog tables. Looks like I will have to move group membership into the group metadata table as well. It felt clunky that changing group membership would be considered as a schema change anyway.

          Show
          toffer Francis Liu added a comment - Should we confine these special tables to default group ? That's a good point, I just realized that it's not possible to add table properties to the catalog tables. Looks like I will have to move group membership into the group metadata table as well. It felt clunky that changing group membership would be considered as a schema change anyway.
          Hide
          toffer Francis Liu added a comment -

          Patch address most of the comments. Mainly:

          -zookeeper as a temporary storage for special table group assignments.
          -storing group table membership in the group table instead of the table descriptor

          Show
          toffer Francis Liu added a comment - Patch address most of the comments. Mainly: -zookeeper as a temporary storage for special table group assignments. -storing group table membership in the group table instead of the table descriptor
          Hide
          toffer Francis Liu added a comment -
          Show
          toffer Francis Liu added a comment - review board: https://reviews.apache.org/r/8389/
          Hide
          yuzhihong@gmail.com Ted Yu added a comment -

          I got a test failure in one of the newly added tests:

          testOffline(org.apache.hadoop.hbase.master.TestGroupsOfflineMode)  Time elapsed: 1.283 sec  <<< ERROR!
          java.lang.NullPointerException
            at org.apache.hadoop.hbase.master.TestGroupsOfflineMode.testOffline(TestGroupsOfflineMode.java:116)
          
          Show
          yuzhihong@gmail.com Ted Yu added a comment - I got a test failure in one of the newly added tests: testOffline(org.apache.hadoop.hbase.master.TestGroupsOfflineMode) Time elapsed: 1.283 sec <<< ERROR! java.lang.NullPointerException at org.apache.hadoop.hbase.master.TestGroupsOfflineMode.testOffline(TestGroupsOfflineMode.java:116)
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12559999/HBASE-6721_94_5.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 16 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3518//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12559999/HBASE-6721_94_5.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 16 new or modified tests. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3518//console This message is automatically generated.
          Hide
          toffer Francis Liu added a comment -

          updated patch, also updated review board.

          Show
          toffer Francis Liu added a comment - updated patch, also updated review board.
          Hide
          toffer Francis Liu added a comment -

          Will update documentation tomorrow.

          Show
          toffer Francis Liu added a comment - Will update documentation tomorrow.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12567985/HBASE-6721_94_6.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 28 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4329//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567985/HBASE-6721_94_6.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 28 new or modified tests. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4329//console This message is automatically generated.
          Hide
          sershe Sergey Shelukhin added a comment -

          Started reviewing, got as far as GroupInfoManagerImpl.java, this will probably take several go-s to go thru the entire patch.
          My main question is - what is the state of the trunk patch? It looks like a huge feature so it should go into trunk first.
          Another one may be (may be - due to not having seen the rest of the patch yet) - why is some JSON serialization used? Will it be different in trunk? We already have PB and writables, so I wonder how it works with them.

          Show
          sershe Sergey Shelukhin added a comment - Started reviewing, got as far as GroupInfoManagerImpl.java, this will probably take several go-s to go thru the entire patch. My main question is - what is the state of the trunk patch? It looks like a huge feature so it should go into trunk first. Another one may be (may be - due to not having seen the rest of the patch yet) - why is some JSON serialization used? Will it be different in trunk? We already have PB and writables, so I wonder how it works with them.
          Hide
          toffer Francis Liu added a comment -

          My main question is - what is the state of the trunk patch?

          Our main focus was to get something released internally thus it's a bit behind. Now that we have things more or less stabilized we'll have time to get this done.

          Another one may be (may be - due to not having seen the rest of the patch yet) - why is some JSON serialization used?

          It was an easy way of not having to deal with actually serializing the data. I'm open to suggestions. For trunk we can use PB.

          Show
          toffer Francis Liu added a comment - My main question is - what is the state of the trunk patch? Our main focus was to get something released internally thus it's a bit behind. Now that we have things more or less stabilized we'll have time to get this done. Another one may be (may be - due to not having seen the rest of the patch yet) - why is some JSON serialization used? It was an easy way of not having to deal with actually serializing the data. I'm open to suggestions. For trunk we can use PB.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12568784/HBASE-6721_94_7.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 28 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4409//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12568784/HBASE-6721_94_7.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 28 new or modified tests. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4409//console This message is automatically generated.
          Hide
          enis Enis Soztutar added a comment -

          With this, master, zk and META are still shared across different groups with little isolation guarantees or SLAs. I am wondering whether this is worth the effort without having proper multi-tenancy at all levels. Isn't it way more easier to install multiple clusters?

          Show
          enis Enis Soztutar added a comment - With this, master, zk and META are still shared across different groups with little isolation guarantees or SLAs. I am wondering whether this is worth the effort without having proper multi-tenancy at all levels. Isn't it way more easier to install multiple clusters?
          Hide
          toffer Francis Liu added a comment -

          With this, master, zk and META are still shared across different groups with little isolation guarantees or SLAs. I am wondering whether this is worth the effort without having proper multi-tenancy at all levels.

          The isolation it provides has been very useful for us. We already have this deployed on 2 clusters. I don't see sharing the master as an issue as interaction to it is normally administrative. As for ZK, there's an effort to make it highly scalable if the majority of clients are read-only. Regarding META, you can have an group which serves only internal tables.

          I am wondering whether this is worth the effort without having proper multi-tenancy at all levels. Isn't it way more easier to install multiple clusters?

          This is much more easier to operate (installing, reallocating servers, tables, etc) while providing us the level of isolation we need. Interacting with it is easier as well, one single configuration file to access any of the tables (if ACLs permit). Keep in mind this does not preclude adding more multi-tenancy features (ie rate limiting, priorities, quotas, etc). This is just the first step, one which we felt provided the biggest gains as well as adding a new primitive we can build upon.

          Show
          toffer Francis Liu added a comment - With this, master, zk and META are still shared across different groups with little isolation guarantees or SLAs. I am wondering whether this is worth the effort without having proper multi-tenancy at all levels. The isolation it provides has been very useful for us. We already have this deployed on 2 clusters. I don't see sharing the master as an issue as interaction to it is normally administrative. As for ZK, there's an effort to make it highly scalable if the majority of clients are read-only. Regarding META, you can have an group which serves only internal tables. I am wondering whether this is worth the effort without having proper multi-tenancy at all levels. Isn't it way more easier to install multiple clusters? This is much more easier to operate (installing, reallocating servers, tables, etc) while providing us the level of isolation we need. Interacting with it is easier as well, one single configuration file to access any of the tables (if ACLs permit). Keep in mind this does not preclude adding more multi-tenancy features (ie rate limiting, priorities, quotas, etc). This is just the first step, one which we felt provided the biggest gains as well as adding a new primitive we can build upon.
          Hide
          apurtell Andrew Purtell added a comment -

          With this, master, zk and META are still shared across different groups with little isolation guarantees or SLAs. I am wondering whether this is worth the effort without having proper multi-tenancy at all levels.

          I agree with this, but in defense of Francis' approach here, HDFS is also shared (though it helps that writes prefer the local datanode, as do shortcut reads). Admission control / QoS is a hard problem that requires a full stack conversation and, in my opinion, calls for considering a framework for admission control spanning from Common outward, something similar to a security access token (perhaps even an attribute of same) with support at each layer for mapping a logical notion of priority or quota to the actual enforcement mechanism, whatever it is. That will be a difficult project to say the least and in the meantime we could make incremental improvements at various layers of the stack if there is a demonstrated benefit.

          Show
          apurtell Andrew Purtell added a comment - With this, master, zk and META are still shared across different groups with little isolation guarantees or SLAs. I am wondering whether this is worth the effort without having proper multi-tenancy at all levels. I agree with this, but in defense of Francis' approach here, HDFS is also shared (though it helps that writes prefer the local datanode, as do shortcut reads). Admission control / QoS is a hard problem that requires a full stack conversation and, in my opinion, calls for considering a framework for admission control spanning from Common outward, something similar to a security access token (perhaps even an attribute of same) with support at each layer for mapping a logical notion of priority or quota to the actual enforcement mechanism, whatever it is. That will be a difficult project to say the least and in the meantime we could make incremental improvements at various layers of the stack if there is a demonstrated benefit.
          Hide
          zjushch chunhui shen added a comment -

          Francis Liu
          How about for this now?
          Is it available in your cluster?
          When do you consider to apply in trunk?

          Since we have all the tables' group info, could we remove GroupInfo.TABLEDESC_PROP_GROUP?

          Could we only use hbase table to storage group information?

          When we update the group information, why not update the memory the first? Then we need't roll back if persistence failed

          Show
          zjushch chunhui shen added a comment - Francis Liu How about for this now? Is it available in your cluster? When do you consider to apply in trunk? Since we have all the tables' group info, could we remove GroupInfo.TABLEDESC_PROP_GROUP? Could we only use hbase table to storage group information? When we update the group information, why not update the memory the first? Then we need't roll back if persistence failed
          Hide
          toffer Francis Liu added a comment -

          chunhui shen

          We have the 0.94 running in our cluster for a while now.

          I've uploaded the trunk patch on review board: https://reviews.apache.org/r/9680/diff/#index_header

          Since we have all the tables' group info, could we remove GroupInfo.TABLEDESC_PROP_GROUP?

          This is needed so we can create and assign a table to a group as part of a create table request.

          Could we only use hbase table to storage group information?

          Yes this is already done in the current implementation.

          When we update the group information, why not update the memory the first? Then we need't roll back if persistence failed

          Can you call out which section in GroupInfoManager you're referring to?

          Show
          toffer Francis Liu added a comment - chunhui shen We have the 0.94 running in our cluster for a while now. I've uploaded the trunk patch on review board: https://reviews.apache.org/r/9680/diff/#index_header Since we have all the tables' group info, could we remove GroupInfo.TABLEDESC_PROP_GROUP? This is needed so we can create and assign a table to a group as part of a create table request. Could we only use hbase table to storage group information? Yes this is already done in the current implementation. When we update the group information, why not update the memory the first? Then we need't roll back if persistence failed Can you call out which section in GroupInfoManager you're referring to?
          Hide
          toffer Francis Liu added a comment -

          I need to updated the 0.94 patch to match changes made it trunk. Will post it soon.

          Show
          toffer Francis Liu added a comment - I need to updated the 0.94 patch to match changes made it trunk. Will post it soon.
          Hide
          toffer Francis Liu added a comment -

          Forgot to mention Vandana and I worked on this patch.

          Show
          toffer Francis Liu added a comment - Forgot to mention Vandana and I worked on this patch.
          Hide
          toffer Francis Liu added a comment -

          Included Trunk patch with Generated PB classes.

          Show
          toffer Francis Liu added a comment - Included Trunk patch with Generated PB classes.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12571470/HBASE-6721_trunk.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 30 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          -1 javadoc. The javadoc tool appears to have generated 6 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 lineLengths. The patch introduces lines longer than 100

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4611//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4611//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4611//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4611//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4611//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4611//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4611//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4611//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4611//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4611//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12571470/HBASE-6721_trunk.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 30 new or modified tests. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. -1 javadoc . The javadoc tool appears to have generated 6 warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 lineLengths . The patch introduces lines longer than 100 +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4611//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4611//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4611//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4611//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4611//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4611//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4611//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4611//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4611//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4611//console This message is automatically generated.
          Hide
          toffer Francis Liu added a comment -

          Updated design doc

          Show
          toffer Francis Liu added a comment - Updated design doc
          Hide
          toffer Francis Liu added a comment -

          Ted Yu Why move the status to "open"?

          Show
          toffer Francis Liu added a comment - Ted Yu Why move the status to "open"?
          Hide
          yuzhihong@gmail.com Ted Yu added a comment -

          Otherwise hadoop QA would treat pdf files as patches.

          Show
          yuzhihong@gmail.com Ted Yu added a comment - Otherwise hadoop QA would treat pdf files as patches.
          Hide
          zjushch chunhui shen added a comment -

          I apply the latest patch to our 0.94 cluster, it seems good.

          We will test it then, and feedback the usage

          Other way, should we add group informations on masert web UI?

          Upload the webUI patch if needed

          Show
          zjushch chunhui shen added a comment - I apply the latest patch to our 0.94 cluster, it seems good. We will test it then, and feedback the usage Other way, should we add group informations on masert web UI? Upload the webUI patch if needed
          Hide
          stack stack added a comment -

          Late to the game... but maybe below is of some help:

          Design doc should have author, date, and reference to this issue.

          Something is incomplete around here in the doc ... "configured to not use the region server grouping features,"

          Should explain why table has such an odd-looking name “0group0”

          When you refer to servers in the shell, should you use the full servername rather than just host and port? e.g:

          group_move_servers 'dest',['server1:port','server2:port']

          Doc. looks good Francis.

          Show
          stack stack added a comment - Late to the game... but maybe below is of some help: Design doc should have author, date, and reference to this issue. Something is incomplete around here in the doc ... "configured to not use the region server grouping features," Should explain why table has such an odd-looking name “0group0” When you refer to servers in the shell, should you use the full servername rather than just host and port? e.g: group_move_servers 'dest', ['server1:port','server2:port'] Doc. looks good Francis.
          Hide
          toffer Francis Liu added a comment -

          Stack I've updated the doc. Addressing your questions. Let me me know if it's missing anything else.

          Show
          toffer Francis Liu added a comment - Stack I've updated the doc. Addressing your questions. Let me me know if it's missing anything else.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12580760/HBASE-6721_8.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 30 new or modified tests.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          -1 javadoc. The javadoc tool appears to have generated 6 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 0 warnings).

          -1 lineLengths. The patch introduces lines longer than 100

          +1 site. The mvn site goal succeeds with this patch.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.backup.TestHFileArchiving

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/5469//testReport/
          Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5469//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5469//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5469//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5469//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5469//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5469//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5469//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5469//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5469//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/5469//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12580760/HBASE-6721_8.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 30 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. -1 javadoc . The javadoc tool appears to have generated 6 warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. -1 release audit . The applied patch generated 2 release audit warnings (more than the trunk's current 0 warnings). -1 lineLengths . The patch introduces lines longer than 100 +1 site . The mvn site goal succeeds with this patch. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.backup.TestHFileArchiving Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/5469//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5469//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5469//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5469//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5469//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5469//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5469//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5469//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5469//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5469//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/5469//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12580775/HBASE-6721_9.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 30 new or modified tests.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          -1 javadoc. The javadoc tool appears to have generated 6 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 0 warnings).

          -1 lineLengths. The patch introduces lines longer than 100

          +1 site. The mvn site goal succeeds with this patch.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.TestFullLogReconstruction
          org.apache.hadoop.hbase.group.TestRegionServerGroups

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/5470//testReport/
          Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5470//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5470//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5470//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5470//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5470//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5470//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5470//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5470//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5470//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/5470//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12580775/HBASE-6721_9.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 30 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. -1 javadoc . The javadoc tool appears to have generated 6 warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. -1 release audit . The applied patch generated 2 release audit warnings (more than the trunk's current 0 warnings). -1 lineLengths . The patch introduces lines longer than 100 +1 site . The mvn site goal succeeds with this patch. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.TestFullLogReconstruction org.apache.hadoop.hbase.group.TestRegionServerGroups Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/5470//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5470//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5470//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5470//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5470//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5470//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5470//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5470//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5470//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5470//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/5470//console This message is automatically generated.
          Hide
          yuzhihong@gmail.com Ted Yu added a comment -

          Looks like MetaScanner.allTableRegions() returned a map where some entry has null ServerName.
          There is no null check:

                  ServerName serverName = HRegionInfo.getServerName(rowResult);
                  regions.put(new UnmodifyableHRegionInfo(info), serverName);
          

          After adding null check in GroupAdminEndpoint#moveTables :

                for (Map.Entry<HRegionInfo, ServerName> entry: regionMap.entrySet()) {
                  if (entry.getValue() == null) continue;
          

          TestRegionServerGroups#testTableMoveAndDrop passed reliably.

          Show
          yuzhihong@gmail.com Ted Yu added a comment - Looks like MetaScanner.allTableRegions() returned a map where some entry has null ServerName. There is no null check: ServerName serverName = HRegionInfo.getServerName(rowResult); regions.put( new UnmodifyableHRegionInfo(info), serverName); After adding null check in GroupAdminEndpoint#moveTables : for (Map.Entry<HRegionInfo, ServerName> entry: regionMap.entrySet()) { if (entry.getValue() == null ) continue ; TestRegionServerGroups#testTableMoveAndDrop passed reliably.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12580943/HBASE-6721_9.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 32 new or modified tests.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          -1 javadoc. The javadoc tool appears to have generated 6 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 0 warnings).

          -1 lineLengths. The patch introduces lines longer than 100

          +1 site. The mvn site goal succeeds with this patch.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.backup.TestHFileArchiving

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/5483//testReport/
          Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5483//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5483//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5483//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5483//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5483//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5483//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5483//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5483//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5483//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/5483//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12580943/HBASE-6721_9.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 32 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. -1 javadoc . The javadoc tool appears to have generated 6 warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. -1 release audit . The applied patch generated 2 release audit warnings (more than the trunk's current 0 warnings). -1 lineLengths . The patch introduces lines longer than 100 +1 site . The mvn site goal succeeds with this patch. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.backup.TestHFileArchiving Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/5483//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5483//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5483//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5483//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5483//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5483//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5483//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5483//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5483//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5483//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/5483//console This message is automatically generated.
          Hide
          toffer Francis Liu added a comment -

          fixed line length and missing apache header

          Show
          toffer Francis Liu added a comment - fixed line length and missing apache header
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12581000/HBASE-6721_10.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 32 new or modified tests.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          -1 javadoc. The javadoc tool appears to have generated 6 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          -1 release audit. The applied patch generated 1 release audit warnings (more than the trunk's current 0 warnings).

          -1 lineLengths. The patch introduces lines longer than 100

          +1 site. The mvn site goal succeeds with this patch.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.util.TestHBaseFsck

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/5485//testReport/
          Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5485//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5485//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5485//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5485//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5485//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5485//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5485//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5485//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5485//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/5485//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581000/HBASE-6721_10.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 32 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. -1 javadoc . The javadoc tool appears to have generated 6 warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. -1 release audit . The applied patch generated 1 release audit warnings (more than the trunk's current 0 warnings). -1 lineLengths . The patch introduces lines longer than 100 +1 site . The mvn site goal succeeds with this patch. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.util.TestHBaseFsck Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/5485//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5485//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5485//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5485//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5485//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5485//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5485//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5485//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5485//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5485//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/5485//console This message is automatically generated.
          Hide
          apurtell Andrew Purtell added a comment -

          The current state of this issue is confusing. The parent is in 'Patch Available' state but seems stale. Subtask #3 has been committed to trunk/0.96 and 0.94. Other subtasks are resolved as 'Invalid'. We should at least have a release note describing what is going to be in 0.96 (and already in 0.94). What further work is going to be done here? What been abandoned? Has enough functionality been committed already to address the contributors' use case? Did something turn out to be problematic? Nothing more for 0.96 since stack has frozen 0.95, correct? Nothing more need in 0.96 to support placing tables on certain regionservers only (will help with 0.96 to 0.98 migration story)? I can research the answers to my questions in commit history but maybe we can get an authoritative answer from Vandana Ayyalasomayajula or Francis Liu?

          Show
          apurtell Andrew Purtell added a comment - The current state of this issue is confusing. The parent is in 'Patch Available' state but seems stale. Subtask #3 has been committed to trunk/0.96 and 0.94. Other subtasks are resolved as 'Invalid'. We should at least have a release note describing what is going to be in 0.96 (and already in 0.94). What further work is going to be done here? What been abandoned? Has enough functionality been committed already to address the contributors' use case? Did something turn out to be problematic? Nothing more for 0.96 since stack has frozen 0.95, correct? Nothing more need in 0.96 to support placing tables on certain regionservers only (will help with 0.96 to 0.98 migration story)? I can research the answers to my questions in commit history but maybe we can get an authoritative answer from Vandana Ayyalasomayajula or Francis Liu ?
          Hide
          toffer Francis Liu added a comment -

          The current state of this issue is confusing. The parent is in 'Patch Available' state but seems stale.

          The decision was to get the NS patch in and this can follow. Hence it got stale.

          Subtask #3 has been committed to trunk/0.96 and 0.94. Other subtasks are resolved as 'Invalid'. We should at least have a release note describing what is going to be in 0.96 (and already in 0.94).

          Subtask #3, is a generic requirement, support for MasterCoprocessor Endpoints. It's an implementation specific to this jira. And so is the nature of subtask #6. Do we need release notes for this? I'm not familiar with what needs to get into release notes. Or how do I update them?

          What further work is going to be done here? What been abandoned? Has enough functionality been committed already to address the contributors' use case? Did something turn out to be problematic?

          It essentially needs a rebase. Also I'd like to update it to make use of FavoredNodes. This jira and subtask #4 needs to be completed to get basic functionality.

          Nothing more for 0.96 since stack has frozen 0.95, correct? Nothing more need in 0.96 to support placing tables on certain regionservers only (will help with 0.96 to 0.98 migration story)? I can research the answers to my questions in commit history but maybe we can get an authoritative answer from Vandana Ayyalasomayajula or Francis Liu?

          In terms of backwards compatilibity issues the only piece missing was updating the LoadBalancer interface which is subtask #6. Tho I did a recheck. The LoadBalancer interface also needs a configure/initialize method (my bad), see LoadBalancer in the patch. Hopefully we can get that in 0.96. If not we'll have to write code to support old LoadBalancers which doesn't have this method.

          My memory is a bit rusty but essentially the rest of the code resides in the custom balancer and the coprocessors. Maybe some internal code changes in AM. So no backwards compatibility issues apart from the one I mentioned. I'll double check early next week.

          Show
          toffer Francis Liu added a comment - The current state of this issue is confusing. The parent is in 'Patch Available' state but seems stale. The decision was to get the NS patch in and this can follow. Hence it got stale. Subtask #3 has been committed to trunk/0.96 and 0.94. Other subtasks are resolved as 'Invalid'. We should at least have a release note describing what is going to be in 0.96 (and already in 0.94). Subtask #3, is a generic requirement, support for MasterCoprocessor Endpoints. It's an implementation specific to this jira. And so is the nature of subtask #6. Do we need release notes for this? I'm not familiar with what needs to get into release notes. Or how do I update them? What further work is going to be done here? What been abandoned? Has enough functionality been committed already to address the contributors' use case? Did something turn out to be problematic? It essentially needs a rebase. Also I'd like to update it to make use of FavoredNodes. This jira and subtask #4 needs to be completed to get basic functionality. Nothing more for 0.96 since stack has frozen 0.95, correct? Nothing more need in 0.96 to support placing tables on certain regionservers only (will help with 0.96 to 0.98 migration story)? I can research the answers to my questions in commit history but maybe we can get an authoritative answer from Vandana Ayyalasomayajula or Francis Liu? In terms of backwards compatilibity issues the only piece missing was updating the LoadBalancer interface which is subtask #6. Tho I did a recheck. The LoadBalancer interface also needs a configure/initialize method (my bad), see LoadBalancer in the patch. Hopefully we can get that in 0.96. If not we'll have to write code to support old LoadBalancers which doesn't have this method. My memory is a bit rusty but essentially the rest of the code resides in the custom balancer and the coprocessors. Maybe some internal code changes in AM. So no backwards compatibility issues apart from the one I mentioned. I'll double check early next week.
          Hide
          toffer Francis Liu added a comment -

          It's an implementation specific to this jira.

          Sorry I meant "it's not an implementation specific...."

          Show
          toffer Francis Liu added a comment - It's an implementation specific to this jira. Sorry I meant "it's not an implementation specific...."
          Hide
          stack stack added a comment -

          This is to be finished post-0.96 then. We should move the issue out Francis Liu. Adding an init/configure to LoadBalancer could come in into 0.96 but would have to be available real fast.

          Show
          stack stack added a comment - This is to be finished post-0.96 then. We should move the issue out Francis Liu . Adding an init/configure to LoadBalancer could come in into 0.96 but would have to be available real fast.
          Hide
          toffer Francis Liu added a comment -

          Yep post-0.96 is fine. It could be backported to minor versions if need be. Is getting the init/configure patch up tonight fast enough?

          Show
          toffer Francis Liu added a comment - Yep post-0.96 is fine. It could be backported to minor versions if need be. Is getting the init/configure patch up tonight fast enough?
          Hide
          stack stack added a comment -
          Show
          stack stack added a comment - Francis Liu yes
          Hide
          apurtell Andrew Purtell added a comment - - edited

          Thanks for clearing it up guys.

          Edit: Would be good if we can get this into 0.98 and also a 0.96 minor, then there's a great rolling upgrade story. Please let me know if I can be helpful there.

          Show
          apurtell Andrew Purtell added a comment - - edited Thanks for clearing it up guys. Edit: Would be good if we can get this into 0.98 and also a 0.96 minor, then there's a great rolling upgrade story. Please let me know if I can be helpful there.
          Hide
          stack stack added a comment -

          Moving out new feature. Chatting w/ Francis Liu last night, could be good one for 0.98.

          Show
          stack stack added a comment - Moving out new feature. Chatting w/ Francis Liu last night, could be good one for 0.98.
          Hide
          apurtell Andrew Purtell added a comment -

          Well then we couldn't use this to stage a 0.96 to 0.98 upgrade on just one placement group. I think that would be a great test of our compatibility story. Any reason this can't come in on a minor 0.96.x release?

          Show
          apurtell Andrew Purtell added a comment - Well then we couldn't use this to stage a 0.96 to 0.98 upgrade on just one placement group. I think that would be a great test of our compatibility story. Any reason this can't come in on a minor 0.96.x release?
          Hide
          stack stack added a comment -

          Its moved out because no dev going on and it a feature.

          Any reason this can't come in on a minor 0.96.x release?

          Because 0.98 will be weeks behind and groups+tags will be reason to upgrade

          Show
          stack stack added a comment - Its moved out because no dev going on and it a feature. Any reason this can't come in on a minor 0.96.x release? Because 0.98 will be weeks behind and groups+tags will be reason to upgrade
          Hide
          stack stack added a comment -

          Sorry, should have said also that I do not see us needing groups to try out a 0.98 daemon in a 0.96 cluster.

          Show
          stack stack added a comment - Sorry, should have said also that I do not see us needing groups to try out a 0.98 daemon in a 0.96 cluster.
          Hide
          toffer Francis Liu added a comment -

          Draft of trunk patch. I still need to move group apis from CP to part of core api. Everything else should remain the same so most of the patch shouldn't change pending review comments.

          Review board here:

          https://reviews.apache.org/r/27673/

          Show
          toffer Francis Liu added a comment - Draft of trunk patch. I still need to move group apis from CP to part of core api. Everything else should remain the same so most of the patch shouldn't change pending review comments. Review board here: https://reviews.apache.org/r/27673/
          Hide
          yuzhihong@gmail.com Ted Yu added a comment -

          @Francis:
          I went through the updated patch but didn't seem to find utilization of FavoredNodes.
          Would that be handled in separate JIRA ?

          Getting Hadoop QA run on this patch is desirable.

          Show
          yuzhihong@gmail.com Ted Yu added a comment - @Francis: I went through the updated patch but didn't seem to find utilization of FavoredNodes. Would that be handled in separate JIRA ? Getting Hadoop QA run on this patch is desirable.
          Hide
          toffer Francis Liu added a comment -

          Ted Yu Yes, that would be in a separate Jira. This is patch is already largish. How do I get Hadoop QA to run?

          Show
          toffer Francis Liu added a comment - Ted Yu Yes, that would be in a separate Jira. This is patch is already largish. How do I get Hadoop QA to run?
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12679918/HBASE-6721_trunk2.patch
          against trunk revision .
          ATTACHMENT ID: 12679918

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 27 new or modified tests.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 javadoc. The javadoc tool appears to have generated 7 warning messages.

          -1 checkstyle. The applied patch generated 3841 checkstyle errors (more than the trunk's current 3788 errors).

          -1 findbugs. The patch appears to introduce 8 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 lineLengths. The patch introduces the following lines longer than 100:
          + * <code>rpc ListTablesOfGroup(.ListTablesOfGroupRequest) returns (.ListTablesOfGroupResponse);</code>
          + * <code>rpc GetGroupInfoOfTable(.GetGroupInfoOfTableRequest) returns (.GetGroupInfoOfTableResponse);</code>
          + * <code>rpc GetGroupOfServer(.GetGroupOfServerRequest) returns (.GetGroupOfServerResponse);</code>
          + * <code>rpc ListServersInTransition(.ListServersInTransitionRequest) returns (.ListServersInTransitionResponse);</code>
          + * <code>rpc ListTablesOfGroup(.ListTablesOfGroupRequest) returns (.ListTablesOfGroupResponse);</code>
          + * <code>rpc GetGroupInfoOfTable(.GetGroupInfoOfTableRequest) returns (.GetGroupInfoOfTableResponse);</code>
          + * <code>rpc GetGroupOfServer(.GetGroupOfServerRequest) returns (.GetGroupOfServerResponse);</code>
          + * <code>rpc ListServersInTransition(.ListServersInTransitionRequest) returns (.ListServersInTransitionResponse);</code>
          + public void listTablesOfGroup(RpcController controller, RSGroupProtos.ListTablesOfGroupRequest request, RpcCallback<RSGroupProtos.ListTablesOfGroupResponse> done) {
          + public void getGroupInfo(RpcController controller, RSGroupProtos.GetGroupInfoRequest request, RpcCallback<RSGroupProtos.GetGroupInfoResponse> done) {

          +1 site. The mvn site goal succeeds with this patch.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11725//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11725//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11725//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11725//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11725//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11725//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11725//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11725//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11725//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11725//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11725//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11725//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
          Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11725//artifact/patchprocess/checkstyle-aggregate.html

          Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11725//artifact/patchprocess/patchJavadocWarnings.txt
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11725//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12679918/HBASE-6721_trunk2.patch against trunk revision . ATTACHMENT ID: 12679918 +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 27 new or modified tests. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 javadoc . The javadoc tool appears to have generated 7 warning messages. -1 checkstyle . The applied patch generated 3841 checkstyle errors (more than the trunk's current 3788 errors). -1 findbugs . The patch appears to introduce 8 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 lineLengths . The patch introduces the following lines longer than 100: + * <code>rpc ListTablesOfGroup(.ListTablesOfGroupRequest) returns (.ListTablesOfGroupResponse);</code> + * <code>rpc GetGroupInfoOfTable(.GetGroupInfoOfTableRequest) returns (.GetGroupInfoOfTableResponse);</code> + * <code>rpc GetGroupOfServer(.GetGroupOfServerRequest) returns (.GetGroupOfServerResponse);</code> + * <code>rpc ListServersInTransition(.ListServersInTransitionRequest) returns (.ListServersInTransitionResponse);</code> + * <code>rpc ListTablesOfGroup(.ListTablesOfGroupRequest) returns (.ListTablesOfGroupResponse);</code> + * <code>rpc GetGroupInfoOfTable(.GetGroupInfoOfTableRequest) returns (.GetGroupInfoOfTableResponse);</code> + * <code>rpc GetGroupOfServer(.GetGroupOfServerRequest) returns (.GetGroupOfServerResponse);</code> + * <code>rpc ListServersInTransition(.ListServersInTransitionRequest) returns (.ListServersInTransitionResponse);</code> + public void listTablesOfGroup(RpcController controller, RSGroupProtos.ListTablesOfGroupRequest request, RpcCallback<RSGroupProtos.ListTablesOfGroupResponse> done) { + public void getGroupInfo(RpcController controller, RSGroupProtos.GetGroupInfoRequest request, RpcCallback<RSGroupProtos.GetGroupInfoResponse> done) { +1 site . The mvn site goal succeeds with this patch. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11725//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11725//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11725//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11725//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11725//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11725//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11725//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11725//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11725//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11725//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11725//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11725//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11725//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11725//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11725//console This message is automatically generated.
          Hide
          enis Enis Soztutar added a comment -

          I left some comments in RB, but I wanted to comment on high level things here.

          I think this is generally a useful feature, since we are seeing more and more multi-tenant use cases, and without really good isolation rs groups allow the cluster to be shared more easily.

          • I was reading up on the recent YARN label feature (YARN-796), since the ideas are very similar. However, in YARN labels, a node can have multiple labels, versus here, a server can only have one group. Will it be better to have one-to-many relationship from servers to groups? All servers will have the default group, as well as any more groups that it is assigned. We can do the same thing for tables as well. By default, tables will belong to group default, but can be added more. For assignment, we just look at all the cumulative list of servers that this region will be assignable to. Will this complicate assignment?
          • Alternatively we can also name this labels (just a suggestion).
          • For all new features, I think it is fair to request one single source of truth and also some more transactional guarantees for operations. If possible, we should not use zk at all (for caching as done in patch). Instead we can just open the region from master. This is different than the co-location discussion for master and meta. This table is tiny, and not query'able from clients. The data is just there for the master to access. If we do this, we do not even have to have a cache.
          • I am ok with bringing this as a core functionality rather than CP + LB. The reasoning is that as clusters grow bigger, more users might be interested in this for better isolation.
          Show
          enis Enis Soztutar added a comment - I left some comments in RB, but I wanted to comment on high level things here. I think this is generally a useful feature, since we are seeing more and more multi-tenant use cases, and without really good isolation rs groups allow the cluster to be shared more easily. I was reading up on the recent YARN label feature ( YARN-796 ), since the ideas are very similar. However, in YARN labels, a node can have multiple labels, versus here, a server can only have one group. Will it be better to have one-to-many relationship from servers to groups? All servers will have the default group, as well as any more groups that it is assigned. We can do the same thing for tables as well. By default, tables will belong to group default, but can be added more. For assignment, we just look at all the cumulative list of servers that this region will be assignable to. Will this complicate assignment? Alternatively we can also name this labels (just a suggestion). For all new features, I think it is fair to request one single source of truth and also some more transactional guarantees for operations. If possible, we should not use zk at all (for caching as done in patch). Instead we can just open the region from master. This is different than the co-location discussion for master and meta. This table is tiny, and not query'able from clients. The data is just there for the master to access. If we do this, we do not even have to have a cache. I am ok with bringing this as a core functionality rather than CP + LB. The reasoning is that as clusters grow bigger, more users might be interested in this for better isolation.
          Hide
          toffer Francis Liu added a comment -

          Enis, thanks for the review:

          Will it be better to have one-to-many relationship from servers to groups? All servers will have the default group, as well as any more groups that it is assigned. We can do the same thing for tables as well. By default, tables will belong to group default, but can be added more.

          I skimmed through YARN-796 doc. To me it seems the notion of yarn labels and rs groups are very different. Labels are intended to add attributes to certain resources (ie gpu, high-memory, etc) so you can request for specific attributes when requesting for resources. While groups mentioned in this jira is meant to identify a logical grouping of resources, meant to guarantee an amount of resources/capacity for a tenant. We have not seen a need for it to date. What do we gain from doing the same for tables? It seems to me the purpose of groups has no need for overlaps either way. Please correct me if I missed something.

          For all new features, I think it is fair to request one single source of truth and also some more transactional guarantees for operations. If possible, we should not use zk at all (for caching as done in patch). Instead we can just open the region from master. This is different than the co-location discussion for master and meta. This table is tiny, and not query'able from clients. The data is just there for the master to access. If we do this, we do not even have to have a cache.

          I have thought of this as well, we can do this but it would require handling group table in a special way. ie make sure it is assigned after meta and before any other table. have a groupWAL, have it's one sets of handlers similar to meta, etc. As well as have a special case for making sure meta and root end up in their designated groups even when group table hasn't been assigned yet. So we're essentially trading one set of complexities for another. Having said that, I'm ok with either approach. Should write up a patch with this approach?

          On a related note, region servers consume group information from ZK similar to security ACL. This is used in replication and other features we are working on. As mentioned it is possible to remove caching usage of ZK but it would be good to keep the information in there until the same facility security acl will eventually be migrating to is available. Thoughts?

          I am ok with bringing this as a core functionality rather than CP + LB. The reasoning is that as clusters grow bigger, more users might be interested in this for better isolation.

          Great with this we can do the approach mentioned earlier if need be.

          Show
          toffer Francis Liu added a comment - Enis, thanks for the review: Will it be better to have one-to-many relationship from servers to groups? All servers will have the default group, as well as any more groups that it is assigned. We can do the same thing for tables as well. By default, tables will belong to group default, but can be added more. I skimmed through YARN-796 doc. To me it seems the notion of yarn labels and rs groups are very different. Labels are intended to add attributes to certain resources (ie gpu, high-memory, etc) so you can request for specific attributes when requesting for resources. While groups mentioned in this jira is meant to identify a logical grouping of resources, meant to guarantee an amount of resources/capacity for a tenant. We have not seen a need for it to date. What do we gain from doing the same for tables? It seems to me the purpose of groups has no need for overlaps either way. Please correct me if I missed something. For all new features, I think it is fair to request one single source of truth and also some more transactional guarantees for operations. If possible, we should not use zk at all (for caching as done in patch). Instead we can just open the region from master. This is different than the co-location discussion for master and meta. This table is tiny, and not query'able from clients. The data is just there for the master to access. If we do this, we do not even have to have a cache. I have thought of this as well, we can do this but it would require handling group table in a special way. ie make sure it is assigned after meta and before any other table. have a groupWAL, have it's one sets of handlers similar to meta, etc. As well as have a special case for making sure meta and root end up in their designated groups even when group table hasn't been assigned yet. So we're essentially trading one set of complexities for another. Having said that, I'm ok with either approach. Should write up a patch with this approach? On a related note, region servers consume group information from ZK similar to security ACL. This is used in replication and other features we are working on. As mentioned it is possible to remove caching usage of ZK but it would be good to keep the information in there until the same facility security acl will eventually be migrating to is available. Thoughts? I am ok with bringing this as a core functionality rather than CP + LB. The reasoning is that as clusters grow bigger, more users might be interested in this for better isolation. Great with this we can do the approach mentioned earlier if need be.
          Hide
          eclark Elliott Clark added a comment -

          I still don't see the need for the added complexity for most users. HBase is running the risk of being too big and too complex for what most users will use it for.

          The real answer to get large clusters is to get multi-tenancy to just work.

          The best answer is not to add a more different admin tools, config options, and things that add more operational overheard. As you add more and more machine to HBase the answer is not to make more and more special cases. It's instead to make the base case better. Get QOS, quotas, tiered storage, and better load balancing to work well and the whole idea of group based assignment is then useless. That's what should be solution for users of HBase. Not more things that need to be explained in the book for the default config/install. Things like:

          • does locality based load balancing work with groups?
          • What should hbck return when there's a region that can't be assigned due to groups?
          • Does hbck know about groups? If it does then what's the message there when I haven't used that feature? If not then will it break things when I use hbck ?
          • How does the open region command work with the groups ?
          • Are the commands listed if the balancer isn't the group based balancer ?
          • What's the default group?
          • Is meta/namespace/acl in the default group?
          • I had zookeeper get deleted/go corrupt/ get restored from backup why are all of my regions moving around?

          All of those are added complexities that real users will run into, and in return only a very few will get any added benefit.

          HBase can't provide the great qos and multi-tenancy right now, so the group based assignment might work well for a specific use case. However it adds complexity and layers of abstraction that hurt most users, while making the ideal solution harder. For that reason I'm against anything that's built into the the default install.

          HBase should be more like nginx and less like apache in this regard. If you need an added feature and it's not what most users want then compile/package it in. It should not be a kitchen sink with 10 million different xml configs needed. The first question asked on every posting to user@ mailing list shouldn't always be "can you post your configs?"

          With all that said I do not want this to seem like I dislike the feature, the code, or the contribution. I really want to say thanks to Francis Liu. I would be a +1 if the code was a co-proc and a different module, something that was much less invasive on the main assignment, and master code.

          Show
          eclark Elliott Clark added a comment - I still don't see the need for the added complexity for most users. HBase is running the risk of being too big and too complex for what most users will use it for. The real answer to get large clusters is to get multi-tenancy to just work. The best answer is not to add a more different admin tools, config options, and things that add more operational overheard. As you add more and more machine to HBase the answer is not to make more and more special cases. It's instead to make the base case better. Get QOS, quotas, tiered storage, and better load balancing to work well and the whole idea of group based assignment is then useless. That's what should be solution for users of HBase. Not more things that need to be explained in the book for the default config/install. Things like: does locality based load balancing work with groups? What should hbck return when there's a region that can't be assigned due to groups? Does hbck know about groups? If it does then what's the message there when I haven't used that feature? If not then will it break things when I use hbck ? How does the open region command work with the groups ? Are the commands listed if the balancer isn't the group based balancer ? What's the default group? Is meta/namespace/acl in the default group? I had zookeeper get deleted/go corrupt/ get restored from backup why are all of my regions moving around? All of those are added complexities that real users will run into, and in return only a very few will get any added benefit. HBase can't provide the great qos and multi-tenancy right now, so the group based assignment might work well for a specific use case. However it adds complexity and layers of abstraction that hurt most users, while making the ideal solution harder. For that reason I'm against anything that's built into the the default install. HBase should be more like nginx and less like apache in this regard. If you need an added feature and it's not what most users want then compile/package it in. It should not be a kitchen sink with 10 million different xml configs needed. The first question asked on every posting to user@ mailing list shouldn't always be "can you post your configs?" With all that said I do not want this to seem like I dislike the feature, the code, or the contribution. I really want to say thanks to Francis Liu . I would be a +1 if the code was a co-proc and a different module, something that was much less invasive on the main assignment, and master code.
          Hide
          lhofhansl Lars Hofhansl added a comment -

          Let me 2nd that. Many thanks to Francis Liu, and also that some of the changes are very complex and pretty invasive.
          Even at Salesforce we do not worry too much cross tenant impact, we rather throw enough resources at it and let HBase do the work, when we see issues we look into general fixes such as the ones that Elliott Clark mentions above.

          Now, Yahoo is a big place with very large clusters and many users on those clusters. If the need arises there for this kind of mulit tenancy, we'd better listen to them and to Francis, before we decide this it not the right path forward.

          Show
          lhofhansl Lars Hofhansl added a comment - Let me 2nd that. Many thanks to Francis Liu , and also that some of the changes are very complex and pretty invasive. Even at Salesforce we do not worry too much cross tenant impact, we rather throw enough resources at it and let HBase do the work, when we see issues we look into general fixes such as the ones that Elliott Clark mentions above. Now, Yahoo is a big place with very large clusters and many users on those clusters. If the need arises there for this kind of mulit tenancy, we'd better listen to them and to Francis, before we decide this it not the right path forward.
          Hide
          enis Enis Soztutar added a comment -

          This is the reason why we have started the discussion thread on dev@ some time ago: http://search-hadoop.com/m/DHED47vIer1.

          RS group based assignment is already a plugin implementation, but we thought that bringing it as a "core" feature made sense because it will take some time to have full QOS and isolation. In the mean time, RS group based assignments are a reasonable solution that solves a real problem (although at Yahoo!'s scale). However, we are seeing more interest in this patch from other users as well.

          If there is strong objection for this being a core feature, I think we should still commit this as a co-proc based implementation.

          Show
          enis Enis Soztutar added a comment - This is the reason why we have started the discussion thread on dev@ some time ago: http://search-hadoop.com/m/DHED47vIer1 . RS group based assignment is already a plugin implementation, but we thought that bringing it as a "core" feature made sense because it will take some time to have full QOS and isolation. In the mean time, RS group based assignments are a reasonable solution that solves a real problem (although at Yahoo!'s scale). However, we are seeing more interest in this patch from other users as well. If there is strong objection for this being a core feature, I think we should still commit this as a co-proc based implementation.
          Hide
          apurtell Andrew Purtell added a comment - - edited

          It could be useful to have a list of pros and cons for coprocessor implementation vs. core integration. Are we considering core integration for mainly code aesthetic reasons, for example, or are there significant obstacles imposed by a coprocessor implementations strategy?

          Show
          apurtell Andrew Purtell added a comment - - edited It could be useful to have a list of pros and cons for coprocessor implementation vs. core integration. Are we considering core integration for mainly code aesthetic reasons, for example, or are there significant obstacles imposed by a coprocessor implementations strategy?
          Hide
          lhofhansl Lars Hofhansl added a comment -

          If there is strong objection for this being a core feature

          I'm not objecting. If this is needed we should chose the path of least future pain. Coprocessors lead to pain (as seen with Phoenix, where it is hard to implement core features without using some of our private interfaces), core changes lead to pain too.

          It could be useful to have a list of pros and cons for coprocessor implementation vs. core integration.

          +1

          Show
          lhofhansl Lars Hofhansl added a comment - If there is strong objection for this being a core feature I'm not objecting. If this is needed we should chose the path of least future pain. Coprocessors lead to pain (as seen with Phoenix, where it is hard to implement core features without using some of our private interfaces), core changes lead to pain too. It could be useful to have a list of pros and cons for coprocessor implementation vs. core integration. +1
          Hide
          toffer Francis Liu added a comment -

          Thanks for the interest and attention guys. I'll put up a seed list of pros and cons over the weekend. And we can hammer out concerns there.

          Similar to namespaces, regionserver groups can actually be enabled by default and existing users wouldn't notice any difference (except for the extra table that needs to be assigned). Unless they actually use it (similar to namespaces).

          FWIW I believe this feature is useful even for small clusters as long as you have a variation of tenants (or just misbehaving tenants). There are bunch of scenarios that need protection/isolation/QoS. ie cascading failures caused by a tenant, block cache thrashing, different configurations (ie GC configs), different requirements. This becomes even more important in a mixed workload environment (MR, storm, etc).

          Show
          toffer Francis Liu added a comment - Thanks for the interest and attention guys. I'll put up a seed list of pros and cons over the weekend. And we can hammer out concerns there. Similar to namespaces, regionserver groups can actually be enabled by default and existing users wouldn't notice any difference (except for the extra table that needs to be assigned). Unless they actually use it (similar to namespaces). FWIW I believe this feature is useful even for small clusters as long as you have a variation of tenants (or just misbehaving tenants). There are bunch of scenarios that need protection/isolation/QoS. ie cascading failures caused by a tenant, block cache thrashing, different configurations (ie GC configs), different requirements. This becomes even more important in a mixed workload environment (MR, storm, etc).
          Hide
          toffer Francis Liu added a comment -

          Here's the doc as promised:

          https://docs.google.com/document/d/1tUCDEEEFknn-HYlDqA72gtv02_nVsoKY_Er_8nBoW_0/edit

          Comments are enabled ping me if anyone want's edit access to the doc.

          I've also update the design doc.

          Fire away.

          Show
          toffer Francis Liu added a comment - Here's the doc as promised: https://docs.google.com/document/d/1tUCDEEEFknn-HYlDqA72gtv02_nVsoKY_Er_8nBoW_0/edit Comments are enabled ping me if anyone want's edit access to the doc. I've also update the design doc. Fire away.
          Hide
          toffer Francis Liu added a comment -

          We discussed this during HBaseCon. The consensus was to get this into a feature branch to address any stability/integration issues as well as any other concerns.

          Jonathan Hsieh Elliott Clark Enis Soztutar How to we go about creating the branch? Should we finish the review and merge it in? Should I rebase?

          Show
          toffer Francis Liu added a comment - We discussed this during HBaseCon. The consensus was to get this into a feature branch to address any stability/integration issues as well as any other concerns. Jonathan Hsieh Elliott Clark Enis Soztutar How to we go about creating the branch? Should we finish the review and merge it in? Should I rebase?
          Hide
          jmhsieh Jonathan Hsieh added a comment -

          Francis Liu. Here's my suggestion: Let's finish the current review in review board. Then instead of rebasing, let's use that patch from the point you branched at and start a 'hbase-6721' from there. We can do periodic merges into the 'hbase-6721' branch afterwards and capture the changes due to the merges.

          Show
          jmhsieh Jonathan Hsieh added a comment - Francis Liu . Here's my suggestion: Let's finish the current review in review board. Then instead of rebasing, let's use that patch from the point you branched at and start a 'hbase-6721' from there. We can do periodic merges into the 'hbase-6721' branch afterwards and capture the changes due to the merges.
          Hide
          toffer Francis Liu added a comment -

          Works for me.

          Show
          toffer Francis Liu added a comment - Works for me.
          Hide
          yuzhihong@gmail.com Ted Yu added a comment -

          Looks like branch hbase-6721 has not been created.

          Should we proceed with what Jon suggested above ?

          Show
          yuzhihong@gmail.com Ted Yu added a comment - Looks like branch hbase-6721 has not been created. Should we proceed with what Jon suggested above ?
          Hide
          apurtell Andrew Purtell added a comment -

          Creating the branch is easy. Was the review on RB finished?

          Show
          apurtell Andrew Purtell added a comment - Creating the branch is easy. Was the review on RB finished?
          Hide
          toffer Francis Liu added a comment -

          Unless I'm missing something it seems the review is still pending.

          Jonathan Hsieh would you still be able to complete the review?

          Show
          toffer Francis Liu added a comment - Unless I'm missing something it seems the review is still pending. Jonathan Hsieh would you still be able to complete the review?
          Hide
          jmhsieh Jonathan Hsieh added a comment -

          Sorry, I didn't realize I was blocking. I can't commit to doing a review in the short term – please proceed without me. Let me suggest that we commit what we have with a cursory review to a branch and then make progress there.

          Show
          jmhsieh Jonathan Hsieh added a comment - Sorry, I didn't realize I was blocking. I can't commit to doing a review in the short term – please proceed without me. Let me suggest that we commit what we have with a cursory review to a branch and then make progress there.
          Hide
          apurtell Andrew Purtell added a comment -

          Let me suggest that we commit what we have with a cursory review to a branch and then make progress there.

          +1
          Ping me for assistance when you're ready to commit to a branch Francis Liu

          Show
          apurtell Andrew Purtell added a comment - Let me suggest that we commit what we have with a cursory review to a branch and then make progress there. +1 Ping me for assistance when you're ready to commit to a branch Francis Liu
          Hide
          toffer Francis Liu added a comment -

          Thanks Andrew Purtell let me go through the patch and rebase it as well. Will put up an updated patch this week.

          Show
          toffer Francis Liu added a comment - Thanks Andrew Purtell let me go through the patch and rebase it as well. Will put up an updated patch this week.
          Hide
          toffer Francis Liu added a comment -

          Rebased patch.

          Show
          toffer Francis Liu added a comment - Rebased patch.
          Hide
          toffer Francis Liu added a comment -

          Andrew Purtell I rebased the patch onto trunk. This should be good enough to commit to a branch.

          Show
          toffer Francis Liu added a comment - Andrew Purtell I rebased the patch onto trunk. This should be good enough to commit to a branch.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12747704/HBASE-6721_11.patch
          against master branch at commit 05de2ec5801fbba4577fb363f858a6e6f282c104.
          ATTACHMENT ID: 12747704

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 30 new or modified tests.

          +1 hadoop versions. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 protoc. The applied patch does not increase the total number of protoc compiler warnings.

          -1 javadoc. The javadoc tool appears to have generated 6 warning messages.

          -1 checkstyle. The applied patch generated 1908 checkstyle errors (more than the master's current 1864 errors).

          -1 InterfaceAudience. The patch appears to contain InterfaceAudience from hadoop rather than hbase:
          +import org.apache.hadoop.classification.InterfaceAudience;
          +import org.apache.hadoop.classification.InterfaceStability;
          +import org.apache.hadoop.classification.InterfaceAudience;
          +import org.apache.hadoop.classification.InterfaceAudience;.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          -1 release audit. The applied patch generated 1 release audit warnings (more than the master's current 0 warnings).

          -1 lineLengths. The patch introduces the following lines longer than 100:
          + * <code>rpc getLastMajorCompactionTimestamp(.hbase.pb.MajorCompactionTimestampRequest) returns (.hbase.pb.MajorCompactionTimestampResponse);</code>
          + * <code>rpc getLastMajorCompactionTimestampForRegion(.hbase.pb.MajorCompactionTimestampForRegionRequest) returns (.hbase.pb.MajorCompactionTimestampResponse);</code>
          + * <code>rpc getProcedureResult(.hbase.pb.GetProcedureResultRequest) returns (.hbase.pb.GetProcedureResultResponse);</code>
          + * <code>rpc GetGroupInfo(.hbase.pb.GetGroupInfoRequest) returns (.hbase.pb.GetGroupInfoResponse);</code>
          + * <code>rpc GetGroupInfoOfTable(.hbase.pb.GetGroupInfoOfTableRequest) returns (.hbase.pb.GetGroupInfoOfTableResponse);</code>
          + * <code>rpc GetGroupInfoOfServer(.hbase.pb.GetGroupInfoOfServerRequest) returns (.hbase.pb.GetGroupInfoOfServerResponse);</code>
          + * <code>rpc MoveServers(.hbase.pb.MoveServersRequest) returns (.hbase.pb.MoveServersResponse);</code>
          + * <code>rpc MoveTables(.hbase.pb.MoveTablesRequest) returns (.hbase.pb.MoveTablesResponse);</code>
          + * <code>rpc RemoveGroup(.hbase.pb.RemoveGroupRequest) returns (.hbase.pb.RemoveGroupResponse);</code>
          + * <code>rpc BalanceGroup(.hbase.pb.BalanceGroupRequest) returns (.hbase.pb.BalanceGroupResponse);</code>

          +1 site. The mvn post-site goal succeeds with this patch.

          -1 core tests. The patch failed these unit tests:

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14922//testReport/
          Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14922//artifact/patchprocess/patchReleaseAuditWarnings.txt
          Release Findbugs (version 2.0.3) warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14922//artifact/patchprocess/newFindbugsWarnings.html
          Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14922//artifact/patchprocess/checkstyle-aggregate.html

          Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14922//artifact/patchprocess/patchJavadocWarnings.txt
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14922//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12747704/HBASE-6721_11.patch against master branch at commit 05de2ec5801fbba4577fb363f858a6e6f282c104. ATTACHMENT ID: 12747704 +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 30 new or modified tests. +1 hadoop versions . The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 protoc . The applied patch does not increase the total number of protoc compiler warnings. -1 javadoc . The javadoc tool appears to have generated 6 warning messages. -1 checkstyle . The applied patch generated 1908 checkstyle errors (more than the master's current 1864 errors). -1 InterfaceAudience . The patch appears to contain InterfaceAudience from hadoop rather than hbase: +import org.apache.hadoop.classification.InterfaceAudience; +import org.apache.hadoop.classification.InterfaceStability; +import org.apache.hadoop.classification.InterfaceAudience; +import org.apache.hadoop.classification.InterfaceAudience;. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. -1 release audit . The applied patch generated 1 release audit warnings (more than the master's current 0 warnings). -1 lineLengths . The patch introduces the following lines longer than 100: + * <code>rpc getLastMajorCompactionTimestamp(.hbase.pb.MajorCompactionTimestampRequest) returns (.hbase.pb.MajorCompactionTimestampResponse);</code> + * <code>rpc getLastMajorCompactionTimestampForRegion(.hbase.pb.MajorCompactionTimestampForRegionRequest) returns (.hbase.pb.MajorCompactionTimestampResponse);</code> + * <code>rpc getProcedureResult(.hbase.pb.GetProcedureResultRequest) returns (.hbase.pb.GetProcedureResultResponse);</code> + * <code>rpc GetGroupInfo(.hbase.pb.GetGroupInfoRequest) returns (.hbase.pb.GetGroupInfoResponse);</code> + * <code>rpc GetGroupInfoOfTable(.hbase.pb.GetGroupInfoOfTableRequest) returns (.hbase.pb.GetGroupInfoOfTableResponse);</code> + * <code>rpc GetGroupInfoOfServer(.hbase.pb.GetGroupInfoOfServerRequest) returns (.hbase.pb.GetGroupInfoOfServerResponse);</code> + * <code>rpc MoveServers(.hbase.pb.MoveServersRequest) returns (.hbase.pb.MoveServersResponse);</code> + * <code>rpc MoveTables(.hbase.pb.MoveTablesRequest) returns (.hbase.pb.MoveTablesResponse);</code> + * <code>rpc RemoveGroup(.hbase.pb.RemoveGroupRequest) returns (.hbase.pb.RemoveGroupResponse);</code> + * <code>rpc BalanceGroup(.hbase.pb.BalanceGroupRequest) returns (.hbase.pb.BalanceGroupResponse);</code> +1 site . The mvn post-site goal succeeds with this patch. -1 core tests . The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14922//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14922//artifact/patchprocess/patchReleaseAuditWarnings.txt Release Findbugs (version 2.0.3) warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14922//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14922//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14922//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14922//console This message is automatically generated.
          Hide
          apurtell Andrew Purtell added a comment - - edited

          Thanks Francis Liu. I applied your latest patch to master and pushed the result as new branch "hbase-6721". I checked that it compiled before pushing but didn't run tests.

          Do you need this for 0.98? If so, we can do another branch "hbase-6721-0.98" with a backport and rebase it at every RC. Or branch-1.

          Show
          apurtell Andrew Purtell added a comment - - edited Thanks Francis Liu . I applied your latest patch to master and pushed the result as new branch "hbase-6721". I checked that it compiled before pushing but didn't run tests. Do you need this for 0.98? If so, we can do another branch "hbase-6721-0.98" with a backport and rebase it at every RC. Or branch-1.
          Hide
          toffer Francis Liu added a comment -

          Thanks Andrew Purtell, that'd be great let me work on one.

          Show
          toffer Francis Liu added a comment - Thanks Andrew Purtell , that'd be great let me work on one.
          Hide
          gwu91 Guorui Wu added a comment -

          As I've been studying this code, in order to aid with the understanding of HBASE-6721, I’ve created some UML sequence diagrams of functions within GroupBasedLoadBalancer.java. This UML diagram only focuses on GroupBasedLoadBalancer, since GroupBasedLoadBalancer, along with GroupInfoManager, GroupInfoManagerImpl and GroupInfo as well as some way to configure groups represent the core functionality needed for this implementation. This represents the 1,500 core lines implementing the region server group functionality; this documentation does not cover the extensive configuration management via CLI stored in Zookeeper and an HBase table.

          The goal of GroupBasedLoadBalancer is to separate servers and regions into groups, and to balance within groups using another load balancer (referred to as internal load balancer). The default load balancer used is StochasticLoadBalancer. Each group is stored as a GroupInfo object, and GroupInfoManagerImpl holds a collection of GroupInfo objects with APIs allowing for the manipulation of this collection. These groups can be created and configured via the CLI, and are stored within an HBase table as well as Zookeeper. As GroupBasedLoadBalancer implements the LoadBalancer interface, it has the following functions from the interface:
          As GroupBasedLoadBalancer implements the LoadBalancer interface, it has the following functions from the interface:
          • balanceCluster
          • roundRobinAssignment
          • retainAssignment
          • immediateAssignment
          • randomAssignment
          • initialize
          • isStopped
          • setGroupInfoManager

          Additionally, GroupBasedLoadBalancer has several helper functions:
          • offlineRetainAssignment
          • onlineRetainAssignment
          • generateGroupMaps
          • filterOfflineServers
          • filterServers
          • getMisplacedRegions
          • correctAssignments
          • isOnline
          • getGroupInfoManager

          Through reading the code for the load balancer portion of HBASE-6721, I have a few questions:
          1. Within the interface GroupInfoManager.java, I noticed that the function getGroupOfServer returns a GroupInfo object, but the function getGroupOfTable returns a String object. Was there a performance consideration or some other reason for returning a string? (It seems the API would be more consistent to return a GroupInfo object.)
          2. For the function onlineRetainAssignment why are regions assigned to bogus so it ends up in RIT if a server is not available? (We would like to keep as few regions in RIT as possible in order to maximize our availability.)
          3. On the topic of onlineRetainAssignment, what is the objective for separating online and offline servers? I noticed that another balancer such as the StochasticLoadBalancer does not make such a distinction.

          Lastly, the UML diagram I created can be edited by downloading the attached XML file and editing with http://draw.io.

          Show
          gwu91 Guorui Wu added a comment - As I've been studying this code, in order to aid with the understanding of HBASE-6721 , I’ve created some UML sequence diagrams of functions within GroupBasedLoadBalancer.java. This UML diagram only focuses on GroupBasedLoadBalancer, since GroupBasedLoadBalancer, along with GroupInfoManager, GroupInfoManagerImpl and GroupInfo as well as some way to configure groups represent the core functionality needed for this implementation. This represents the 1,500 core lines implementing the region server group functionality; this documentation does not cover the extensive configuration management via CLI stored in Zookeeper and an HBase table. The goal of GroupBasedLoadBalancer is to separate servers and regions into groups, and to balance within groups using another load balancer (referred to as internal load balancer). The default load balancer used is StochasticLoadBalancer . Each group is stored as a GroupInfo object, and GroupInfoManagerImpl holds a collection of GroupInfo objects with APIs allowing for the manipulation of this collection. These groups can be created and configured via the CLI, and are stored within an HBase table as well as Zookeeper. As GroupBasedLoadBalancer implements the LoadBalancer interface, it has the following functions from the interface: As GroupBasedLoadBalancer implements the LoadBalancer interface, it has the following functions from the interface: • balanceCluster • roundRobinAssignment • retainAssignment • immediateAssignment • randomAssignment • initialize • isStopped • setGroupInfoManager Additionally, GroupBasedLoadBalancer has several helper functions: • offlineRetainAssignment • onlineRetainAssignment • generateGroupMaps • filterOfflineServers • filterServers • getMisplacedRegions • correctAssignments • isOnline • getGroupInfoManager Through reading the code for the load balancer portion of HBASE-6721 , I have a few questions: 1. Within the interface GroupInfoManager.java, I noticed that the function getGroupOfServer returns a GroupInfo object, but the function getGroupOfTable returns a String object. Was there a performance consideration or some other reason for returning a string? (It seems the API would be more consistent to return a GroupInfo object.) 2. For the function onlineRetainAssignment why are regions assigned to bogus so it ends up in RIT if a server is not available? (We would like to keep as few regions in RIT as possible in order to maximize our availability.) 3. On the topic of onlineRetainAssignment, what is the objective for separating online and offline servers? I noticed that another balancer such as the StochasticLoadBalancer does not make such a distinction. Lastly, the UML diagram I created can be edited by downloading the attached XML file and editing with http://draw.io .
          Hide
          toffer Francis Liu added a comment -

          draft of backported patch to 98. need to run integration and unit tests then it should be good to go.

          Show
          toffer Francis Liu added a comment - draft of backported patch to 98. need to run integration and unit tests then it should be good to go.
          Hide
          toffer Francis Liu added a comment -

          1. Within the interface GroupInfoManager.java, I noticed that the function getGroupOfServer returns a GroupInfo object, but the function getGroupOfTable returns a String object. Was there a performance consideration or some other reason for returning a string? (It seems the API would be more consistent to return a GroupInfo object.)

          Yeah performance. I didn't want to have to gather the information to generate the GroupInfo object since most internal calls won't need it. The external api call is consistent tho.

          2. For the function onlineRetainAssignment why are regions assigned to bogus so it ends up in RIT if a server is not available? (We would like to keep as few regions in RIT as possible in order to maximize our availability.)

          This guarantees isolation. Which is one of the key features of Region Server Groups. ie you don't want the reason that the region from one group to run out of live servers to affect the regions in another group. So worst case you affect the availability of the group and not the entire cluster. What's your use case for region server groups?

          3. On the topic of onlineRetainAssignment, what is the objective for separating online and offline servers? I noticed that another balancer such as the StochasticLoadBalancer does not make such a distinction.

          Yes that's inherent in GroupBaseLoadBalancer. Stochastic only looks at what's online, while Group needs to look at which member servers are online.

          Show
          toffer Francis Liu added a comment - 1. Within the interface GroupInfoManager.java, I noticed that the function getGroupOfServer returns a GroupInfo object, but the function getGroupOfTable returns a String object. Was there a performance consideration or some other reason for returning a string? (It seems the API would be more consistent to return a GroupInfo object.) Yeah performance. I didn't want to have to gather the information to generate the GroupInfo object since most internal calls won't need it. The external api call is consistent tho. 2. For the function onlineRetainAssignment why are regions assigned to bogus so it ends up in RIT if a server is not available? (We would like to keep as few regions in RIT as possible in order to maximize our availability.) This guarantees isolation. Which is one of the key features of Region Server Groups. ie you don't want the reason that the region from one group to run out of live servers to affect the regions in another group. So worst case you affect the availability of the group and not the entire cluster. What's your use case for region server groups? 3. On the topic of onlineRetainAssignment, what is the objective for separating online and offline servers? I noticed that another balancer such as the StochasticLoadBalancer does not make such a distinction. Yes that's inherent in GroupBaseLoadBalancer. Stochastic only looks at what's online, while Group needs to look at which member servers are online.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12750267/HBASE-6721_98_1.patch
          against master branch at commit 9c69bf766fcad024bef5760f242cae2bc609b374.
          ATTACHMENT ID: 12750267

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 33 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15084//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12750267/HBASE-6721_98_1.patch against master branch at commit 9c69bf766fcad024bef5760f242cae2bc609b374. ATTACHMENT ID: 12750267 +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 33 new or modified tests. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15084//console This message is automatically generated.
          Hide
          apurtell Andrew Purtell added a comment -

          draft of backported patch to 98. need to run integration and unit tests then it should be good to go.

          Thanks Francis Liu, let me know when you think it's ready to go in and I'll make a branch with its application

          Show
          apurtell Andrew Purtell added a comment - draft of backported patch to 98. need to run integration and unit tests then it should be good to go. Thanks Francis Liu , let me know when you think it's ready to go in and I'll make a branch with its application
          Hide
          toffer Francis Liu added a comment -

          Tests passed tho I had to patch HBase/IntegrationTestingUtilitily to get the tests to run properly, there was a regression in an api that used to work in distributed mode and now it doesn't. Will deal with that issue in a separate jira.

          Reconciled some changes with internal implementation, rebased.

          Andrew Purtell the patch should be good to commit. will need to create an addendum for trunk patch.

          Show
          toffer Francis Liu added a comment - Tests passed tho I had to patch HBase/IntegrationTestingUtilitily to get the tests to run properly, there was a regression in an api that used to work in distributed mode and now it doesn't. Will deal with that issue in a separate jira. Reconciled some changes with internal implementation, rebased. Andrew Purtell the patch should be good to commit. will need to create an addendum for trunk patch.
          Hide
          apurtell Andrew Purtell added a comment -

          the patch should be good to commit.

          I made a branch 'hbase-6721-0.98' with the v2 0.98 patch applied and pushed it.

          will need to create an addendum for trunk patch.

          Please post the addendum here, I'll commit it to hbase-6721 branch.

          When updating the feature branches do you prefer me to merge or rebase? I typically rebase in my own work but will do what you prefer, just say.

          Show
          apurtell Andrew Purtell added a comment - the patch should be good to commit. I made a branch 'hbase-6721-0.98' with the v2 0.98 patch applied and pushed it. will need to create an addendum for trunk patch. Please post the addendum here, I'll commit it to hbase-6721 branch. When updating the feature branches do you prefer me to merge or rebase? I typically rebase in my own work but will do what you prefer, just say.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12750554/HBASE-6721_98_2.patch
          against master branch at commit 45aafb25b7911b5917ab47133e3e4268806e4c91.
          ATTACHMENT ID: 12750554

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 33 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15104//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12750554/HBASE-6721_98_2.patch against master branch at commit 45aafb25b7911b5917ab47133e3e4268806e4c91. ATTACHMENT ID: 12750554 +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 33 new or modified tests. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15104//console This message is automatically generated.
          Hide
          ndimiduk Nick Dimiduk added a comment -

          Reattaching 'HBASE-6721_0.98_2.patch' with '0.98' in the name to match the branch-specific filter.

          Show
          ndimiduk Nick Dimiduk added a comment - Reattaching ' HBASE-6721 _0.98_2.patch' with '0.98' in the name to match the branch-specific filter.
          Hide
          apurtell Andrew Purtell added a comment -

          I ran the binary compatibility checker tool comparing hbase-6721-0.98 with 0.98.13. The tool says hbase-6721-0.98 is 100% binary compatible with 0.98.13, with some source level incompatibility of mild concern in the MasterObserver interface.

          Show
          apurtell Andrew Purtell added a comment - I ran the binary compatibility checker tool comparing hbase-6721-0.98 with 0.98.13. The tool says hbase-6721-0.98 is 100% binary compatible with 0.98.13, with some source level incompatibility of mild concern in the MasterObserver interface.
          Hide
          apurtell Andrew Purtell added a comment -

          Do we need GroupAdmin and GroupAdminClient now? A hold over from an initial coprocessor based implementation?

          Show
          apurtell Andrew Purtell added a comment - Do we need GroupAdmin and GroupAdminClient now? A hold over from an initial coprocessor based implementation?
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12750586/HBASE-6721_0.98_2.patch
          against 0.98 branch at commit 45aafb25b7911b5917ab47133e3e4268806e4c91.
          ATTACHMENT ID: 12750586

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 33 new or modified tests.

          +1 hadoop versions. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 protoc. The applied patch does not increase the total number of protoc compiler warnings.

          -1 javadoc. The javadoc tool appears to have generated 28 warning messages.

          -1 checkstyle. The applied patch generated 3913 checkstyle errors (more than the master's current 3876 errors).

          -1 InterfaceAudience. The patch appears to contain InterfaceAudience from hadoop rather than hbase:
          +import org.apache.hadoop.classification.InterfaceAudience;
          +import org.apache.hadoop.classification.InterfaceStability;
          +import org.apache.hadoop.classification.InterfaceAudience;
          +import org.apache.hadoop.classification.InterfaceAudience;.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          -1 release audit. The applied patch generated 1 release audit warnings (more than the master's current 0 warnings).

          -1 lineLengths. The patch introduces the following lines longer than 100:
          + public GetGroupInfoResponse getGroupInfo(RpcController controller, GetGroupInfoRequest request) throws ServiceException {
          + public GetGroupInfoOfTableResponse getGroupInfoOfTable(RpcController controller, GetGroupInfoOfTableRequest request) throws ServiceException {
          + public GetGroupInfoOfServerResponse getGroupInfoOfServer(RpcController controller, GetGroupInfoOfServerRequest request) throws ServiceException {
          + public MoveServersResponse moveServers(RpcController controller, MoveServersRequest request) throws ServiceException {
          + public MoveTablesResponse moveTables(RpcController controller, MoveTablesRequest request) throws ServiceException {
          + public AddGroupResponse addGroup(RpcController controller, AddGroupRequest request) throws ServiceException {
          + public RemoveGroupResponse removeGroup(RpcController controller, RemoveGroupRequest request) throws ServiceException {
          + public BalanceGroupResponse balanceGroup(RpcController controller, BalanceGroupRequest request) throws ServiceException {
          + public ListGroupInfosResponse listGroupInfos(RpcController controller, ListGroupInfosRequest request) throws ServiceException {
          + * <code>rpc GetGroupInfoOfTable(.GetGroupInfoOfTableRequest) returns (.GetGroupInfoOfTableResponse);</code>

          -1 site. The patch appears to cause mvn post-site goal to fail.

          +1 core tests. The patch passed unit tests in .

          -1 core zombie tests. There are 5 zombie test(s): at org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.testLogRolling(TestLogRolling.java:219)
          at org.apache.hadoop.hbase.TestIOFencing.testFencingAroundCompaction(TestIOFencing.java:227)
          at org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.testRITStateForRollback(TestSplitTransactionOnCluster.java:292)
          at org.apache.hadoop.hbase.io.encoding.TestEncodedSeekers.testEncodedSeeker(TestEncodedSeekers.java:118)
          at org.apache.hadoop.hbase.TestClusterBootOrder.testBootRegionServerFirst(TestClusterBootOrder.java:104)

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15105//testReport/
          Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15105//artifact/patchprocess/patchReleaseAuditWarnings.txt
          Release Findbugs (version 2.0.3) warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15105//artifact/patchprocess/newFindbugsWarnings.html
          Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15105//artifact/patchprocess/checkstyle-aggregate.html

          Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15105//artifact/patchprocess/patchJavadocWarnings.txt
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15105//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12750586/HBASE-6721_0.98_2.patch against 0.98 branch at commit 45aafb25b7911b5917ab47133e3e4268806e4c91. ATTACHMENT ID: 12750586 +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 33 new or modified tests. +1 hadoop versions . The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 protoc . The applied patch does not increase the total number of protoc compiler warnings. -1 javadoc . The javadoc tool appears to have generated 28 warning messages. -1 checkstyle . The applied patch generated 3913 checkstyle errors (more than the master's current 3876 errors). -1 InterfaceAudience . The patch appears to contain InterfaceAudience from hadoop rather than hbase: +import org.apache.hadoop.classification.InterfaceAudience; +import org.apache.hadoop.classification.InterfaceStability; +import org.apache.hadoop.classification.InterfaceAudience; +import org.apache.hadoop.classification.InterfaceAudience;. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. -1 release audit . The applied patch generated 1 release audit warnings (more than the master's current 0 warnings). -1 lineLengths . The patch introduces the following lines longer than 100: + public GetGroupInfoResponse getGroupInfo(RpcController controller, GetGroupInfoRequest request) throws ServiceException { + public GetGroupInfoOfTableResponse getGroupInfoOfTable(RpcController controller, GetGroupInfoOfTableRequest request) throws ServiceException { + public GetGroupInfoOfServerResponse getGroupInfoOfServer(RpcController controller, GetGroupInfoOfServerRequest request) throws ServiceException { + public MoveServersResponse moveServers(RpcController controller, MoveServersRequest request) throws ServiceException { + public MoveTablesResponse moveTables(RpcController controller, MoveTablesRequest request) throws ServiceException { + public AddGroupResponse addGroup(RpcController controller, AddGroupRequest request) throws ServiceException { + public RemoveGroupResponse removeGroup(RpcController controller, RemoveGroupRequest request) throws ServiceException { + public BalanceGroupResponse balanceGroup(RpcController controller, BalanceGroupRequest request) throws ServiceException { + public ListGroupInfosResponse listGroupInfos(RpcController controller, ListGroupInfosRequest request) throws ServiceException { + * <code>rpc GetGroupInfoOfTable(.GetGroupInfoOfTableRequest) returns (.GetGroupInfoOfTableResponse);</code> -1 site . The patch appears to cause mvn post-site goal to fail. +1 core tests . The patch passed unit tests in . -1 core zombie tests . There are 5 zombie test(s): at org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.testLogRolling(TestLogRolling.java:219) at org.apache.hadoop.hbase.TestIOFencing.testFencingAroundCompaction(TestIOFencing.java:227) at org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.testRITStateForRollback(TestSplitTransactionOnCluster.java:292) at org.apache.hadoop.hbase.io.encoding.TestEncodedSeekers.testEncodedSeeker(TestEncodedSeekers.java:118) at org.apache.hadoop.hbase.TestClusterBootOrder.testBootRegionServerFirst(TestClusterBootOrder.java:104) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15105//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15105//artifact/patchprocess/patchReleaseAuditWarnings.txt Release Findbugs (version 2.0.3) warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15105//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15105//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15105//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15105//console This message is automatically generated.
          Hide
          toffer Francis Liu added a comment -

          Thanks Nick Dimiduk wasn't aware of that feature.

          Show
          toffer Francis Liu added a comment - Thanks Nick Dimiduk wasn't aware of that feature.
          Hide
          toffer Francis Liu added a comment -

          I ran the binary compatibility checker tool comparing hbase-6721-0.98 with 0.98.13. The tool says hbase-6721-0.98 is 100% binary compatible with 0.98.13, with some source level incompatibility of mild concern in the MasterObserver interface.

          This is prolly because of the new methods this patch adds to the interface. This shouldn't be a concern then?

          Do we need GroupAdmin and GroupAdminClient now? A hold over from an initial coprocessor based implementation?

          This was based on review comments from Enis Soztutar tho I'm not sure if the comment still stands post-cp implementation. I am fine either way. Should this be in HBaseAdmin then?

          https://reviews.apache.org/r/27673/diff/2?file=751817#file751817line52
          https://reviews.apache.org/r/27673/diff/2?file=751816#file751816line35

          Show
          toffer Francis Liu added a comment - I ran the binary compatibility checker tool comparing hbase-6721-0.98 with 0.98.13. The tool says hbase-6721-0.98 is 100% binary compatible with 0.98.13, with some source level incompatibility of mild concern in the MasterObserver interface. This is prolly because of the new methods this patch adds to the interface. This shouldn't be a concern then? Do we need GroupAdmin and GroupAdminClient now? A hold over from an initial coprocessor based implementation? This was based on review comments from Enis Soztutar tho I'm not sure if the comment still stands post-cp implementation. I am fine either way. Should this be in HBaseAdmin then? https://reviews.apache.org/r/27673/diff/2?file=751817#file751817line52 https://reviews.apache.org/r/27673/diff/2?file=751816#file751816line35
          Hide
          apurtell Andrew Purtell added a comment -

          Do we need GroupAdmin and GroupAdminClient now? A hold over from an initial coprocessor based implementation?

          This was based on review comments from Enis Soztutar tho I'm not sure if the comment still stands post-cp implementation. I am fine either way. Should this be in HBaseAdmin then?

          I opened HBASE-14226 to discuss merging them.

          Show
          apurtell Andrew Purtell added a comment - Do we need GroupAdmin and GroupAdminClient now? A hold over from an initial coprocessor based implementation? This was based on review comments from Enis Soztutar tho I'm not sure if the comment still stands post-cp implementation. I am fine either way. Should this be in HBaseAdmin then? I opened HBASE-14226 to discuss merging them.
          Hide
          apurtell Andrew Purtell added a comment -

          The tool says hbase-6721-0.98 is 100% binary compatible with 0.98.13, with some source level incompatibility of mild concern in the MasterObserver interface.

          This is prolly because of the new methods this patch adds to the interface. This shouldn't be a concern then?

          Yes, it will be a concern for things like Apache Phoenix. (See their IndexMasterObserver, etc.) We can handle this by using compatibility helpers that won't attempt to invoke the new APIs on MasterObservers that do not implement them. We do something like this in WALCoprocessorHost.

          Show
          apurtell Andrew Purtell added a comment - The tool says hbase-6721-0.98 is 100% binary compatible with 0.98.13, with some source level incompatibility of mild concern in the MasterObserver interface. This is prolly because of the new methods this patch adds to the interface. This shouldn't be a concern then? Yes, it will be a concern for things like Apache Phoenix. (See their IndexMasterObserver, etc.) We can handle this by using compatibility helpers that won't attempt to invoke the new APIs on MasterObservers that do not implement them. We do something like this in WALCoprocessorHost.
          Hide
          enis Enis Soztutar added a comment -

          This was based on review comments from Enis Soztutar tho I'm not sure if the comment still stands post-cp implementation. I am fine either way. Should this be in HBaseAdmin then?

          This comes back to the discussion of whether RS groups are coprocessor based or core feature. If a core feature, then GroupAdmin methods should move to Admin interface. Otherwise, a GroupAdmin interface + a way to construct the GroupAdminClient would be the way to go.

          Show
          enis Enis Soztutar added a comment - This was based on review comments from Enis Soztutar tho I'm not sure if the comment still stands post-cp implementation. I am fine either way. Should this be in HBaseAdmin then? This comes back to the discussion of whether RS groups are coprocessor based or core feature. If a core feature, then GroupAdmin methods should move to Admin interface. Otherwise, a GroupAdmin interface + a way to construct the GroupAdminClient would be the way to go.
          Hide
          toffer Francis Liu added a comment -

          Andrew Purtell Attached is the addedum patch which addresses three things: 1. Forward port some changes from 0.98 feature branch, 2. Address CP backward compatibility support, 3. Address removal of GroupAdmin and GroupAdminClient and moving the apis to Admin. Let me know if you'd like me to break these up into separate patches. Didn't do that as I thought it'd be more complicated.

          Show
          toffer Francis Liu added a comment - Andrew Purtell Attached is the addedum patch which addresses three things: 1. Forward port some changes from 0.98 feature branch, 2. Address CP backward compatibility support, 3. Address removal of GroupAdmin and GroupAdminClient and moving the apis to Admin. Let me know if you'd like me to break these up into separate patches. Didn't do that as I thought it'd be more complicated.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12752029/HBASE-6721_hbase-6721_addendum.patch
          against master branch at commit 9334a47d4570f8adfc003f0fb2c5969a88c3bba0.
          ATTACHMENT ID: 12752029

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 21 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15222//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12752029/HBASE-6721_hbase-6721_addendum.patch against master branch at commit 9334a47d4570f8adfc003f0fb2c5969a88c3bba0. ATTACHMENT ID: 12752029 +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 21 new or modified tests. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15222//console This message is automatically generated.
          Hide
          apurtell Andrew Purtell added a comment -

          Let me know if you'd like me to break these up into separate patches.

          Yes, let's do that. Now that we have feature branches with the previous patches posted here committed, all new work are addendums. It's best IMHO if the addendums have their own JIRA identifiers so we can collect patches and any discussion for each on separate tickets instead of all lumped in here.

          1. Forward port some changes from 0.98 feature branch

          This needs committing to the master branch only. I suggest opening a subtask for it.

          2. Address CP backward compatibility support

          This needs committing to both master and 0.98 branches. Work should be done on HBASE-14232. Pretty sure we will need two patches, one for master, one for 0.98

          3. Address removal of GroupAdmin and GroupAdminClient and moving the apis to Admin.

          This needs committing to both master and 0.98 branches. Work should be done on HBASE-14226. Pretty sure we will need two patches, one for master, one for 0.98

          Show
          apurtell Andrew Purtell added a comment - Let me know if you'd like me to break these up into separate patches. Yes, let's do that. Now that we have feature branches with the previous patches posted here committed, all new work are addendums. It's best IMHO if the addendums have their own JIRA identifiers so we can collect patches and any discussion for each on separate tickets instead of all lumped in here. 1. Forward port some changes from 0.98 feature branch This needs committing to the master branch only. I suggest opening a subtask for it. 2. Address CP backward compatibility support This needs committing to both master and 0.98 branches. Work should be done on HBASE-14232 . Pretty sure we will need two patches, one for master, one for 0.98 3. Address removal of GroupAdmin and GroupAdminClient and moving the apis to Admin. This needs committing to both master and 0.98 branches. Work should be done on HBASE-14226 . Pretty sure we will need two patches, one for master, one for 0.98
          Hide
          toffer Francis Liu added a comment -

          Andrew Purtell created the patches in separate jiras. The patches are built on top of each other. The order is: HBASE-14312, HBASE-14232, HBASE-14226. Let me know if you'd like to rebase the succeeding patches once the one(s) prior are committed.

          Show
          toffer Francis Liu added a comment - Andrew Purtell created the patches in separate jiras. The patches are built on top of each other. The order is: HBASE-14312 , HBASE-14232 , HBASE-14226 . Let me know if you'd like to rebase the succeeding patches once the one(s) prior are committed.
          Hide
          toffer Francis Liu added a comment -

          I'll create 98 whenver a trunk patch is committed.

          Show
          toffer Francis Liu added a comment - I'll create 98 whenver a trunk patch is committed.
          Hide
          apurtell Andrew Purtell added a comment -

          Let me know if you'd like to rebase the succeeding patches once the one(s) prior are committed.

          I'm thinking we start carving out work as subtasks, applying patches from the subtasks to hbase-6721 branch, make the 0.98 versions of them and apply them to hbase-6721-0.98 branch. This will make development on these branches almost exactly like dev on a release branch.

          On rebase vs. merge:
          1. I can rebase, fix up for changes, then force push, for both hbase-6721 and hbase-6721-0.98
          2. I can merge master into hbase-6721 (and 0.98 into hbase-6721-0.98), fix up for changes, commit the fixups in a merge commit, and then push the merged result.

          There are pros and cons to either approach. What would work best for you Francis Liu?

          Show
          apurtell Andrew Purtell added a comment - Let me know if you'd like to rebase the succeeding patches once the one(s) prior are committed. I'm thinking we start carving out work as subtasks, applying patches from the subtasks to hbase-6721 branch, make the 0.98 versions of them and apply them to hbase-6721-0.98 branch. This will make development on these branches almost exactly like dev on a release branch. On rebase vs. merge: 1. I can rebase, fix up for changes, then force push, for both hbase-6721 and hbase-6721-0.98 2. I can merge master into hbase-6721 (and 0.98 into hbase-6721-0.98), fix up for changes, commit the fixups in a merge commit, and then push the merged result. There are pros and cons to either approach. What would work best for you Francis Liu ?
          Hide
          toffer Francis Liu added a comment -

          Chatted with Andrew Purtell at the powow. We'll be doing rebases when pulling in changes from master/0.98.

          Show
          toffer Francis Liu added a comment - Chatted with Andrew Purtell at the powow. We'll be doing rebases when pulling in changes from master/0.98.
          Hide
          enis Enis Soztutar added a comment -

          Chatted with Andrew Purtell at the powow. We'll be doing rebases when pulling in changes from master/0.98.

          So how are we gonna do reviews? I think only the master patch reviews are fine, but it would be good if we list the RBs here.

          Show
          enis Enis Soztutar added a comment - Chatted with Andrew Purtell at the powow. We'll be doing rebases when pulling in changes from master/0.98. So how are we gonna do reviews? I think only the master patch reviews are fine, but it would be good if we list the RBs here.
          Hide
          apurtell Andrew Purtell added a comment -

          I was thinking review patches on subtasks as normal. Once the work in total is ready to attempt a branch merge we can put up a single merge patch for here and RB. At any time it's possible to generate omnibus patches for RB if that would be helpful.

          Show
          apurtell Andrew Purtell added a comment - I was thinking review patches on subtasks as normal. Once the work in total is ready to attempt a branch merge we can put up a single merge patch for here and RB. At any time it's possible to generate omnibus patches for RB if that would be helpful.
          Hide
          toffer Francis Liu added a comment -

          Sounds good. I don't have any other changes pending so I'm going to update RB with the new patch on trunk.

          Andrew Purtell thanks for taking care of the backport to 0.98 for the patches.

          Show
          toffer Francis Liu added a comment - Sounds good. I don't have any other changes pending so I'm going to update RB with the new patch on trunk. Andrew Purtell thanks for taking care of the backport to 0.98 for the patches.
          Hide
          toffer Francis Liu added a comment -

          Here's a new RB request:

          https://reviews.apache.org/r/27673/

          Show
          toffer Francis Liu added a comment - Here's a new RB request: https://reviews.apache.org/r/27673/
          Hide
          toffer Francis Liu added a comment -

          Sorry it's not a new RB request I updated the old one. Was thinking wether I should just create a new RB request.

          Show
          toffer Francis Liu added a comment - Sorry it's not a new RB request I updated the old one. Was thinking wether I should just create a new RB request.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12753020/HBASE-6721_12.patch
          against master branch at commit cc1542828de93b8d54cc14497fd5937989ea1b6d.
          ATTACHMENT ID: 12753020

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 33 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15317//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12753020/HBASE-6721_12.patch against master branch at commit cc1542828de93b8d54cc14497fd5937989ea1b6d. ATTACHMENT ID: 12753020 +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 33 new or modified tests. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15317//console This message is automatically generated.
          Hide
          eclark Elliott Clark added a comment -

          I'm still officially -1 as long as this is built into the core. 99.99%(assuming 10k hbase users) of HBase users should never ever run something like this. It will make an already very operationally complex system un-workable. Because of that anything that's in the default code, adds to the default admin, and is built in is something I can't see ever being ok with.

          All of this has already been tried at FB and it was a mistake. This ends up looking functionally very similar to 0.89-fb's favored nodes. ( Only assign regions to specific set of machines that's configured by the admin ). It's so bad that almost every time we try and solve an issue on a cluster with favored nodes, the first thing we do is turn off the balancer so that we don't have to worry about which nodes are configured to have with regions. That's literally step one of debugging. Turn off this feature.
          We'll have a party when FB no longer has this operational nightmare. I won't sign anyone up for the same. I won't sign myself up for the same.

          So I'm -1 on anything that I can't completely remove. RM -RF.

          • Assignment manager is already too complex adding more complexity is awful
          • region movement is already too stateful. Adding more is awful
          • configuration of HBase is already way too complex. Multiplying that with multiple groups is awful.
          • Admin already has way too many things for users to do that cause issue. Adding more ways for a cluster to be borked is awful.
          Show
          eclark Elliott Clark added a comment - I'm still officially -1 as long as this is built into the core. 99.99%(assuming 10k hbase users) of HBase users should never ever run something like this. It will make an already very operationally complex system un-workable. Because of that anything that's in the default code, adds to the default admin, and is built in is something I can't see ever being ok with. All of this has already been tried at FB and it was a mistake. This ends up looking functionally very similar to 0.89-fb's favored nodes. ( Only assign regions to specific set of machines that's configured by the admin ). It's so bad that almost every time we try and solve an issue on a cluster with favored nodes, the first thing we do is turn off the balancer so that we don't have to worry about which nodes are configured to have with regions. That's literally step one of debugging. Turn off this feature. We'll have a party when FB no longer has this operational nightmare. I won't sign anyone up for the same. I won't sign myself up for the same. So I'm -1 on anything that I can't completely remove. RM -RF. Assignment manager is already too complex adding more complexity is awful region movement is already too stateful. Adding more is awful configuration of HBase is already way too complex. Multiplying that with multiple groups is awful. Admin already has way too many things for users to do that cause issue. Adding more ways for a cluster to be borked is awful.
          Hide
          toffer Francis Liu added a comment -

          All of this has already been tried at FB and it was a mistake. This ends up looking functionally very similar to 0.89-fb's favored nodes. ( Only assign regions to specific set of machines that's configured by the admin ). It's so bad that almost every time we try and solve an issue on a cluster with favored nodes, the first thing we do is turn off the balancer so that we don't have to worry about which nodes are configured to have with regions. That's literally step one of debugging. Turn off this feature.
          We'll have a party when FB no longer has this operational nightmare. I won't sign anyone up for the same. I won't sign myself up for the same.

          IMHO that's not an objective comparison. Favored Nodes and Region Server groups are very different. Their use cases are very different and their implementations are also very different.

          As for how useful it is for us (and potenitally for others), if we actually removed region server groups I'm pretty sure our HBase team and SEs would revolt . If we didn't have this feature we would be managing around 80 hbase clusters right now instead of the 6 multi-tenant cluster we are currently running. Step one of debugging is not turning of the balancer that would make things worse. In fact one of the useful features of region server groups is quickly isolating tables to a new group if they are misbehaving or their workload has changed. This can be done in a few minutes if not seconds.

          Assignment manager is already too complex adding more complexity is awful

          If you look at the patch, the change in AM is an extra 20 lines of code. 6 lines are just bugfixes that should be done to AM anyway and the other 14 lines which is fairly straightforward we can even live without if that's what it takes.

          region movement is already too stateful. Adding more is awful

          Adding more states?

          Configuration of HBase is already way too complex. Multiplying that with multiple groups is awful.

          Not sure what the concern here is? That there's an option to configure a different balancer?

          Show
          toffer Francis Liu added a comment - All of this has already been tried at FB and it was a mistake. This ends up looking functionally very similar to 0.89-fb's favored nodes. ( Only assign regions to specific set of machines that's configured by the admin ). It's so bad that almost every time we try and solve an issue on a cluster with favored nodes, the first thing we do is turn off the balancer so that we don't have to worry about which nodes are configured to have with regions. That's literally step one of debugging. Turn off this feature. We'll have a party when FB no longer has this operational nightmare. I won't sign anyone up for the same. I won't sign myself up for the same. IMHO that's not an objective comparison. Favored Nodes and Region Server groups are very different. Their use cases are very different and their implementations are also very different. As for how useful it is for us (and potenitally for others), if we actually removed region server groups I'm pretty sure our HBase team and SEs would revolt . If we didn't have this feature we would be managing around 80 hbase clusters right now instead of the 6 multi-tenant cluster we are currently running. Step one of debugging is not turning of the balancer that would make things worse. In fact one of the useful features of region server groups is quickly isolating tables to a new group if they are misbehaving or their workload has changed. This can be done in a few minutes if not seconds. Assignment manager is already too complex adding more complexity is awful If you look at the patch, the change in AM is an extra 20 lines of code. 6 lines are just bugfixes that should be done to AM anyway and the other 14 lines which is fairly straightforward we can even live without if that's what it takes. region movement is already too stateful. Adding more is awful Adding more states? Configuration of HBase is already way too complex. Multiplying that with multiple groups is awful. Not sure what the concern here is? That there's an option to configure a different balancer?
          Hide
          apurtell Andrew Purtell added a comment -

          Agreed we should evaluate these changes on their own merit.

          No core changes at all isn't a reasonable precondition because even with implementing as a coprocessor based feature we'd need new hooks. Those hooks would touch the same places. IMHO implementing as a coprocessor based feature makes little sense given the relatively small changes needed. it would unnecessarily pollute our space of admin APIs and shell commands with seperate APIs for optional feature like we have with security.

          Agreed the ops impact should be totally optional. Where these changes modify existing function it's reasonable to make sure the modified behavior all sits behind configuration based conditionals.

          Show
          apurtell Andrew Purtell added a comment - Agreed we should evaluate these changes on their own merit. No core changes at all isn't a reasonable precondition because even with implementing as a coprocessor based feature we'd need new hooks. Those hooks would touch the same places. IMHO implementing as a coprocessor based feature makes little sense given the relatively small changes needed. it would unnecessarily pollute our space of admin APIs and shell commands with seperate APIs for optional feature like we have with security. Agreed the ops impact should be totally optional. Where these changes modify existing function it's reasonable to make sure the modified behavior all sits behind configuration based conditionals.
          Hide
          apurtell Andrew Purtell added a comment -

          I also find this a lot less impactful than MOB yet that went in. I think we need to be rational about the risks this change actually imposes.

          Show
          apurtell Andrew Purtell added a comment - I also find this a lot less impactful than MOB yet that went in. I think we need to be rational about the risks this change actually imposes.
          Hide
          eclark Elliott Clark added a comment -

          I also find this a lot less impactful than MOB

          This is touching the code that empirically has been the worst part of hbase, and making it more complex. It's adding things that require user configuration. Adding things that require user interaction on failure (failed open is an awful thing to do to a table).

          This adds another table that's required for the master to interact with for assignment and region movement. Right now master is really struggling because of the feedback loop of rit -> meta -> meta moves -> rit. Now we have double the number of tables.

          On top of that it adds in more things to Zk while we're trying to remove as much data from there as possible.

          This patch breaks ipv6.
          This patch has Copy Paste code.
          This patch doesn't have the correct headers.
          This patch doesn't load meta's hri from disk on move.

          IMHO that's not an objective comparison. Favored Nodes and Region Server groups are very different.

          Not really they both are user configured places to put regions. Both features interact with the balancer and AM. Both features affect the cluster in disasters in basically the same way.

          Adding more states?

          Failed open due to bogus servers. So an old state used in a new way. That means something different than it did before.

          That there's an option to configure a different balancer?

          There are tons more configurations than that. There's namespace goups, table groups and groups for regionservers to be part of. Then there's all of the different configurations for the servers that are in different groups.

          This is the worst possible way to do multi-tenancy. We're layering a hack on rather than do the real thing.
          You're getting the same number of SPOF's and an added operational complexity.

          If we want multi-tenancy, then lets do that right. We shouldn't accept hacks that don't help most users and make getting to the correct solution harder.

          Show
          eclark Elliott Clark added a comment - I also find this a lot less impactful than MOB This is touching the code that empirically has been the worst part of hbase, and making it more complex. It's adding things that require user configuration. Adding things that require user interaction on failure (failed open is an awful thing to do to a table). This adds another table that's required for the master to interact with for assignment and region movement. Right now master is really struggling because of the feedback loop of rit -> meta -> meta moves -> rit. Now we have double the number of tables. On top of that it adds in more things to Zk while we're trying to remove as much data from there as possible. This patch breaks ipv6. This patch has Copy Paste code. This patch doesn't have the correct headers. This patch doesn't load meta's hri from disk on move. IMHO that's not an objective comparison. Favored Nodes and Region Server groups are very different. Not really they both are user configured places to put regions. Both features interact with the balancer and AM. Both features affect the cluster in disasters in basically the same way. Adding more states? Failed open due to bogus servers. So an old state used in a new way. That means something different than it did before. That there's an option to configure a different balancer? There are tons more configurations than that. There's namespace goups, table groups and groups for regionservers to be part of. Then there's all of the different configurations for the servers that are in different groups. This is the worst possible way to do multi-tenancy. We're layering a hack on rather than do the real thing. You're getting the same number of SPOF's and an added operational complexity. If we want multi-tenancy, then lets do that right. We shouldn't accept hacks that don't help most users and make getting to the correct solution harder.
          Hide
          enis Enis Soztutar added a comment -

          This patch breaks ipv6.

          This patch has Copy Paste code.

          This patch doesn't have the correct headers.

          This patch doesn't load meta's hri from disk on move.

          These are review comments that are not related to the discussion right now.

          If we want multi-tenancy, then lets do that right. We shouldn't accept hacks that don't help most users and make getting to the correct solution harder.

          I don't understand your reasoning. Yahoo has been running with this for >1 year, and we have other users that will run with RS groups once it is in. The way I see it is not a hack, but a step towards better guarantees. Multi-tenancy and isolation will never be perfect and will always be an ongoing effort to improve it. For example, without making the coprocessors run in a colocated container, we cannot isolate one tenants co-processors from the others today. Saying that we should fix EVERY multitenancy/isolation problem which can take years is not realistic.

          Agreed the ops impact should be totally optional.

          +1 to this. I think with the patch it is already the case. If you are not using the feature, you should not be affected in any way. Yes, with RS groups, if you do not know what you are doing, it is more states and more things can go wrong, but we are not making it default or anything. Memcache based block cache is only used by FB and it is just added complexity for remaining users, but we have it regardless.

          Show
          enis Enis Soztutar added a comment - This patch breaks ipv6. This patch has Copy Paste code. This patch doesn't have the correct headers. This patch doesn't load meta's hri from disk on move. These are review comments that are not related to the discussion right now. If we want multi-tenancy, then lets do that right. We shouldn't accept hacks that don't help most users and make getting to the correct solution harder. I don't understand your reasoning. Yahoo has been running with this for >1 year, and we have other users that will run with RS groups once it is in. The way I see it is not a hack, but a step towards better guarantees. Multi-tenancy and isolation will never be perfect and will always be an ongoing effort to improve it. For example, without making the coprocessors run in a colocated container, we cannot isolate one tenants co-processors from the others today. Saying that we should fix EVERY multitenancy/isolation problem which can take years is not realistic. Agreed the ops impact should be totally optional. +1 to this. I think with the patch it is already the case. If you are not using the feature, you should not be affected in any way. Yes, with RS groups, if you do not know what you are doing, it is more states and more things can go wrong, but we are not making it default or anything. Memcache based block cache is only used by FB and it is just added complexity for remaining users, but we have it regardless.
          Hide
          eclark Elliott Clark added a comment -

          we have other users that will run with RS groups once it is in.

          We're failing all of those users then.

          Memcache based block cache is only used by FB and it is just added complexity for remaining users, but we have it regardless.

          It's a pluggable thing with it off and no added complexity to the default path. If you think it should be in a different module I would be amenable to moving it.
          As i've said before that's what I think we should do here. Make it pluggable with no changes to the core and I'm fine with it.

          Show
          eclark Elliott Clark added a comment - we have other users that will run with RS groups once it is in. We're failing all of those users then. Memcache based block cache is only used by FB and it is just added complexity for remaining users, but we have it regardless. It's a pluggable thing with it off and no added complexity to the default path. If you think it should be in a different module I would be amenable to moving it. As i've said before that's what I think we should do here. Make it pluggable with no changes to the core and I'm fine with it.
          Hide
          toffer Francis Liu added a comment -

          It's adding things that require user configuration. Adding things that require user interaction on failure (failed open is an awful thing to do to a table).

          This is a key feature, it's better to penalize a misbehaving table than have it affect and potentially take down the entire cluster. This will only happen if all the servers are down analogous to a cluster being down. It's pretty intuitive. We can add a better message if need be.

          This adds another table that's required for the master to interact with for assignment and region movement. Right now master is really struggling because of the feedback loop of rit -> meta -> meta moves -> rit. Now we have double the number of tables.

          It actually doesn't. The table is only read at startup and never again.

          On top of that it adds in more things to Zk while we're trying to remove as much data from there as possible.

          Yes, this will be addressed just like the other modules that currently use zk.

          This patch breaks ipv6.
          This patch has Copy Paste code.
          This patch doesn't have the correct headers.
          This patch doesn't load meta's hri from disk on move.

          Thanks for the review. I'll make sure to address them. Can you point out which code is copy paste. Also please elaborate on meta hri? Prolly post it on RB so we don't pollute this discussion?

          Not really they both are user configured places to put regions. Both features interact with the balancer and AM. Both features affect the cluster in disasters in basically the same way.

          IMHO this is a huge oversimplification. I agree with Andrew Purtell let's evaluate regionserver group on it's own merits. ie User configures regionservers to put tables on not regions. Region servers can be any number while in FN it can be at most (or only?) 3. FN was designed to guarantee performance, RS was meant to guarantee isolation.

          There are tons more configurations than that. There's namespace goups, table groups and groups for regionservers to be part of.

          Right now there's only group and it can have regionservers and tables as members.

          Then there's all of the different configurations for the servers that are in different groups.

          This is no different from configuring different clusters and is optional. A really good feature IMHO.

          This is the worst possible way to do multi-tenancy. We're layering a hack on rather than do the real thing.
          You're getting the same number of SPOF's and an added operational complexity.

          IMHO the operational complexity is better than managing the same number of clusters. I honestly would not like to increase my number of SPOFs that increases the probability of one going down. Also isn't there NN HA already?

          If we want multi-tenancy, then lets do that right. We shouldn't accept hacks that don't help most users and make getting to the correct solution harder.

          IMHO this will help a lot of users. This is not only a multi-tenancy solution but an isolation solution. It will help them isolate different workloads (ie CD/PROD, batch/realtime). It will also help them isolate system tables becoming unavailable because a user access pattern/data surfaced a bug which caused the RS to die. It also allows you to configure each group differently (ie GC settings). It also provides coprocessor isolation.

          Show
          toffer Francis Liu added a comment - It's adding things that require user configuration. Adding things that require user interaction on failure (failed open is an awful thing to do to a table). This is a key feature, it's better to penalize a misbehaving table than have it affect and potentially take down the entire cluster. This will only happen if all the servers are down analogous to a cluster being down. It's pretty intuitive. We can add a better message if need be. This adds another table that's required for the master to interact with for assignment and region movement. Right now master is really struggling because of the feedback loop of rit -> meta -> meta moves -> rit. Now we have double the number of tables. It actually doesn't. The table is only read at startup and never again. On top of that it adds in more things to Zk while we're trying to remove as much data from there as possible. Yes, this will be addressed just like the other modules that currently use zk. This patch breaks ipv6. This patch has Copy Paste code. This patch doesn't have the correct headers. This patch doesn't load meta's hri from disk on move. Thanks for the review. I'll make sure to address them. Can you point out which code is copy paste. Also please elaborate on meta hri? Prolly post it on RB so we don't pollute this discussion? Not really they both are user configured places to put regions. Both features interact with the balancer and AM. Both features affect the cluster in disasters in basically the same way. IMHO this is a huge oversimplification. I agree with Andrew Purtell let's evaluate regionserver group on it's own merits. ie User configures regionservers to put tables on not regions. Region servers can be any number while in FN it can be at most (or only?) 3. FN was designed to guarantee performance, RS was meant to guarantee isolation. There are tons more configurations than that. There's namespace goups, table groups and groups for regionservers to be part of. Right now there's only group and it can have regionservers and tables as members. Then there's all of the different configurations for the servers that are in different groups. This is no different from configuring different clusters and is optional. A really good feature IMHO. This is the worst possible way to do multi-tenancy. We're layering a hack on rather than do the real thing. You're getting the same number of SPOF's and an added operational complexity. IMHO the operational complexity is better than managing the same number of clusters. I honestly would not like to increase my number of SPOFs that increases the probability of one going down. Also isn't there NN HA already? If we want multi-tenancy, then lets do that right. We shouldn't accept hacks that don't help most users and make getting to the correct solution harder. IMHO this will help a lot of users. This is not only a multi-tenancy solution but an isolation solution. It will help them isolate different workloads (ie CD/PROD, batch/realtime). It will also help them isolate system tables becoming unavailable because a user access pattern/data surfaced a bug which caused the RS to die. It also allows you to configure each group differently (ie GC settings). It also provides coprocessor isolation.
          Hide
          eclark Elliott Clark added a comment -

          IMHO this is a huge oversimplification.

          All of the bad parts are the same. You still have to config it. You still have really weird behavior on cluster instability.

          Right now there's only group and it can have regionservers and tables as members.

          The patch has a group attached to namespaces, and also provides the ability to move tables to different groups.

          IMHO this will help a lot of users.

          Every user that's running this that doesn't have 10K machines is making a huge mistake. And we as the HBase developers should provide the better solution not the easiest solution. Not the one that works for one company.

          This is not only a multi-tenancy solution but an isolation solution.

          Isolation is the worse solution for getting multi-tenancy.

          Show
          eclark Elliott Clark added a comment - IMHO this is a huge oversimplification. All of the bad parts are the same. You still have to config it. You still have really weird behavior on cluster instability. Right now there's only group and it can have regionservers and tables as members. The patch has a group attached to namespaces, and also provides the ability to move tables to different groups. IMHO this will help a lot of users. Every user that's running this that doesn't have 10K machines is making a huge mistake. And we as the HBase developers should provide the better solution not the easiest solution. Not the one that works for one company. This is not only a multi-tenancy solution but an isolation solution. Isolation is the worse solution for getting multi-tenancy.
          Hide
          ndimiduk Nick Dimiduk added a comment -

          This is not only a multi-tenancy solution but an isolation solution.

          Isolation is the worse solution for getting multi-tenancy.

          Maybe you can elaborate on this point? Seems we need to quarantine users from each other, whether that's physically as per this patch or via imposed resource controls within a single process. Either way, "quarantine" is the same as "isolation"; we're isolating users from each other to achieve fairness of service delivery in a multi-tenant environment.

          Show
          ndimiduk Nick Dimiduk added a comment - This is not only a multi-tenancy solution but an isolation solution. Isolation is the worse solution for getting multi-tenancy. Maybe you can elaborate on this point? Seems we need to quarantine users from each other, whether that's physically as per this patch or via imposed resource controls within a single process. Either way, "quarantine" is the same as "isolation"; we're isolating users from each other to achieve fairness of service delivery in a multi-tenant environment.
          Hide
          apurtell Andrew Purtell added a comment -

          Let's not let the perfect be the enemy of the good. A "proper" multi-layer admission control change isn't on the table, it isn't on anyone's roadmap, it isn't even something proposed on a JIRA and/or in a design document. Even if we have a proposal for HBase this will certainly be considered imperfect and incomplete by some without 100% agreement and a plan at the HDFS level, and getting that is as likely as finding a unicorn wandering around downtown SF. (Ok... maybe a horse dressed to look like a unicorn could be a thing...) Meanwhile we have a patch on deck and we need to be evaluating it and its contributor's concerns on their merit.

          This is something that one of our esteemed users runs in production, is persistent about getting in and responsive to feedback, and both of those things in my opinion should carry a lot of weight. The same kind of weight that previously proposed changes like HFileV2 or the 0.90 master rewrite (remember that?), or the memcache-based block cache carried, or pending IPv6 related changes.

          That said, I don't think it's ready to be merged into master. We have it up in a feature branch. Let's continue that, address concerns, make sure it's totally optional for those who don't want it, measure its impact.

          Show
          apurtell Andrew Purtell added a comment - Let's not let the perfect be the enemy of the good. A "proper" multi-layer admission control change isn't on the table, it isn't on anyone's roadmap, it isn't even something proposed on a JIRA and/or in a design document. Even if we have a proposal for HBase this will certainly be considered imperfect and incomplete by some without 100% agreement and a plan at the HDFS level, and getting that is as likely as finding a unicorn wandering around downtown SF. (Ok... maybe a horse dressed to look like a unicorn could be a thing...) Meanwhile we have a patch on deck and we need to be evaluating it and its contributor's concerns on their merit. This is something that one of our esteemed users runs in production, is persistent about getting in and responsive to feedback, and both of those things in my opinion should carry a lot of weight. The same kind of weight that previously proposed changes like HFileV2 or the 0.90 master rewrite (remember that?), or the memcache-based block cache carried, or pending IPv6 related changes. That said, I don't think it's ready to be merged into master. We have it up in a feature branch. Let's continue that, address concerns, make sure it's totally optional for those who don't want it, measure its impact.
          Hide
          eclark Elliott Clark added a comment -

          Maybe you can elaborate on this point?

          Sure let me write up something more on that point.

          Let's not let the perfect be the enemy of the good.

          I'm not asking for perfect. I'm asking not to take a half step back for most users so that one user can merge this feature and take a single step forward.

          Meanwhile we have a patch on deck and we need to be evaluating it and its contributor's concerns on their merit.

          Yeah and I have evaluated it and used my knowledge and judgement and cast my vote on this patch and feature in accordance with the Apache Foundation rules (http://www.apache.org/foundation/voting.html#votes-on-code-modification) . I have added my technical reasons. I have even outlined what I would need to vote a different way.

          the 0.90 master rewrite (remember that?)

          Yeah that has since proved to be the wrong way to go. It put way too much in ZK and now we've spent years un-doing it. We would have been better served with asking for parts to be pluggable and not on by default.

          or the memcache-based block cache carried

          That followed the exact same bar that I'm requesting here. No added complexity on the default use case. I've even moved it into a different module so that it will be exactly what I'm asking for here.

          Show
          eclark Elliott Clark added a comment - Maybe you can elaborate on this point? Sure let me write up something more on that point. Let's not let the perfect be the enemy of the good. I'm not asking for perfect. I'm asking not to take a half step back for most users so that one user can merge this feature and take a single step forward. Meanwhile we have a patch on deck and we need to be evaluating it and its contributor's concerns on their merit. Yeah and I have evaluated it and used my knowledge and judgement and cast my vote on this patch and feature in accordance with the Apache Foundation rules ( http://www.apache.org/foundation/voting.html#votes-on-code-modification ) . I have added my technical reasons. I have even outlined what I would need to vote a different way. the 0.90 master rewrite (remember that?) Yeah that has since proved to be the wrong way to go. It put way too much in ZK and now we've spent years un-doing it. We would have been better served with asking for parts to be pluggable and not on by default. or the memcache-based block cache carried That followed the exact same bar that I'm requesting here. No added complexity on the default use case. I've even moved it into a different module so that it will be exactly what I'm asking for here.
          Hide
          apurtell Andrew Purtell added a comment -

          No added complexity on the default use case.

          With the group assignment logic changes behind a default off conditional that criteria is met here too.

          Show
          apurtell Andrew Purtell added a comment - No added complexity on the default use case. With the group assignment logic changes behind a default off conditional that criteria is met here too.
          Hide
          apurtell Andrew Purtell added a comment -

          Yeah and I have evaluated it and used my knowledge and judgement and cast my vote on this patch and feature in accordance with the Apache Foundation rules (http://www.apache.org/foundation/voting.html#votes-on-code-modification) . I have added my technical reasons

          Let me collect them here:

          • Don't break ipv6.
          • Remove Copy Paste code.
          • Add correct headers.
          • Load meta's hri from disk on move.
          • No added complexity on the default path

          I'd like to add:

          • Ops impact should be totally optional: configurable, default off. (This is related to the last point above.)

          If I've missed any items please let me know.

          Show
          apurtell Andrew Purtell added a comment - Yeah and I have evaluated it and used my knowledge and judgement and cast my vote on this patch and feature in accordance with the Apache Foundation rules ( http://www.apache.org/foundation/voting.html#votes-on-code-modification ) . I have added my technical reasons Let me collect them here: Don't break ipv6. Remove Copy Paste code. Add correct headers. Load meta's hri from disk on move. No added complexity on the default path I'd like to add: Ops impact should be totally optional: configurable, default off. (This is related to the last point above.) If I've missed any items please let me know.
          Hide
          toffer Francis Liu added a comment -

          In the interest of moving this feature forward I propose we change the current patch to use a coprocessor-based implementation. The broad strokes of the implementation are:

          1. Group APIs will move to a coprocessor endpoint
          2. Other hooks into core code will utilize coprocessor observers and possibly add more cp hooks in core as needed.
          3. Security checks will be embedded directly into the endpoint, which will be activated if security is enabled.
          4. Add new api/semantic in LoadBalancer so that it can tell callees when regions should not be assigned (ie bogus server name).
          5. CLI will remain as-is (same as security)

          Elliott Clark does this work for you?

          Show
          toffer Francis Liu added a comment - In the interest of moving this feature forward I propose we change the current patch to use a coprocessor-based implementation. The broad strokes of the implementation are: 1. Group APIs will move to a coprocessor endpoint 2. Other hooks into core code will utilize coprocessor observers and possibly add more cp hooks in core as needed. 3. Security checks will be embedded directly into the endpoint, which will be activated if security is enabled. 4. Add new api/semantic in LoadBalancer so that it can tell callees when regions should not be assigned (ie bogus server name). 5. CLI will remain as-is (same as security) Elliott Clark does this work for you?
          Hide
          eclark Elliott Clark added a comment -

          That works great for me and would address the concerns I had.

          Show
          eclark Elliott Clark added a comment - That works great for me and would address the concerns I had.
          Hide
          enis Enis Soztutar added a comment -

          Great.
          We can in theory define another Observer type (RSGroupObserver, etc) which will be the endpoint that coprocessors might implement for example for learning about RS groups operations. But it will be a coprocessor having its own co-processors which seems is not needed at the moment.

          Show
          enis Enis Soztutar added a comment - Great. We can in theory define another Observer type (RSGroupObserver, etc) which will be the endpoint that coprocessors might implement for example for learning about RS groups operations. But it will be a coprocessor having its own co-processors which seems is not needed at the moment.
          Hide
          toffer Francis Liu added a comment -

          Enis Soztutar I can do that too and have security make use of it. Should be just as much effort as embedding security directly as I already have the RS group cp hooks in place as part of the current patch?

          Show
          toffer Francis Liu added a comment - Enis Soztutar I can do that too and have security make use of it. Should be just as much effort as embedding security directly as I already have the RS group cp hooks in place as part of the current patch?
          Hide
          apurtell Andrew Purtell added a comment -

          But it will be a coprocessor having its own co-processors which seems is not needed at the moment.

          Yeah, let's avoid that until/unless we really need it

          Show
          apurtell Andrew Purtell added a comment - But it will be a coprocessor having its own co-processors which seems is not needed at the moment. Yeah, let's avoid that until/unless we really need it
          Hide
          enis Enis Soztutar added a comment -

          Agreed, that let's keep it simple for now unless needed.

          Show
          enis Enis Soztutar added a comment - Agreed, that let's keep it simple for now unless needed.
          Hide
          toffer Francis Liu added a comment -

          Latest patch attached. The patch makes the implementation cp-based.

          To avoid confusion I've created a new RB entry:

          https://reviews.apache.org/r/38224/

          Andrew Purtell The new patch is based off of current master. Should we just replace the current branch with a new branch to go with the new patch? Also was wondering since this is now cp-based. Do we still need a branch?

          Elliott Clark I believe I've addressed most of your concerns. The only thing I missed was "Load meta's hri from disk on move" can you elaborate more one what needs to be done on RB?

          Show
          toffer Francis Liu added a comment - Latest patch attached. The patch makes the implementation cp-based. To avoid confusion I've created a new RB entry: https://reviews.apache.org/r/38224/ Andrew Purtell The new patch is based off of current master. Should we just replace the current branch with a new branch to go with the new patch? Also was wondering since this is now cp-based. Do we still need a branch? Elliott Clark I believe I've addressed most of your concerns. The only thing I missed was "Load meta's hri from disk on move" can you elaborate more one what needs to be done on RB?
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12754964/HBASE-6721_13.patch
          against master branch at commit 27d3ab43efeabe2a0e1173858b6994b17b5c355b.
          ATTACHMENT ID: 12754964

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 27 new or modified tests.

          +1 hadoop versions. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 protoc. The applied patch does not increase the total number of protoc compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 checkstyle. The applied patch does not increase the total number of checkstyle errors

          -1 findbugs. The patch appears to cause Findbugs (version 2.0.3) to fail.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 lineLengths. The patch introduces the following lines longer than 100:
          + * <code>rpc GetGroupInfo(.hbase.pb.GetGroupInfoRequest) returns (.hbase.pb.GetGroupInfoResponse);</code>
          + * <code>rpc GetGroupInfoOfTable(.hbase.pb.GetGroupInfoOfTableRequest) returns (.hbase.pb.GetGroupInfoOfTableResponse);</code>
          + * <code>rpc GetGroupInfoOfServer(.hbase.pb.GetGroupInfoOfServerRequest) returns (.hbase.pb.GetGroupInfoOfServerResponse);</code>
          + * <code>rpc MoveServers(.hbase.pb.MoveServersRequest) returns (.hbase.pb.MoveServersResponse);</code>
          + * <code>rpc MoveTables(.hbase.pb.MoveTablesRequest) returns (.hbase.pb.MoveTablesResponse);</code>
          + * <code>rpc RemoveGroup(.hbase.pb.RemoveGroupRequest) returns (.hbase.pb.RemoveGroupResponse);</code>
          + * <code>rpc BalanceGroup(.hbase.pb.BalanceGroupRequest) returns (.hbase.pb.BalanceGroupResponse);</code>
          + * <code>rpc ListGroupInfos(.hbase.pb.ListGroupInfosRequest) returns (.hbase.pb.ListGroupInfosResponse);</code>
          + * <code>rpc GetGroupInfo(.hbase.pb.GetGroupInfoRequest) returns (.hbase.pb.GetGroupInfoResponse);</code>
          + * <code>rpc GetGroupInfoOfTable(.hbase.pb.GetGroupInfoOfTableRequest) returns (.hbase.pb.GetGroupInfoOfTableResponse);</code>

          -1 site. The patch appears to cause mvn post-site goal to fail.

          -1 core tests. The patch failed these unit tests:

          -1 core zombie tests. There are 1 zombie test(s):

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15506//testReport/
          Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15506//artifact/patchprocess/checkstyle-aggregate.html

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15506//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12754964/HBASE-6721_13.patch against master branch at commit 27d3ab43efeabe2a0e1173858b6994b17b5c355b. ATTACHMENT ID: 12754964 +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 27 new or modified tests. +1 hadoop versions . The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1) +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 protoc . The applied patch does not increase the total number of protoc compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 checkstyle . The applied patch does not increase the total number of checkstyle errors -1 findbugs . The patch appears to cause Findbugs (version 2.0.3) to fail. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 lineLengths . The patch introduces the following lines longer than 100: + * <code>rpc GetGroupInfo(.hbase.pb.GetGroupInfoRequest) returns (.hbase.pb.GetGroupInfoResponse);</code> + * <code>rpc GetGroupInfoOfTable(.hbase.pb.GetGroupInfoOfTableRequest) returns (.hbase.pb.GetGroupInfoOfTableResponse);</code> + * <code>rpc GetGroupInfoOfServer(.hbase.pb.GetGroupInfoOfServerRequest) returns (.hbase.pb.GetGroupInfoOfServerResponse);</code> + * <code>rpc MoveServers(.hbase.pb.MoveServersRequest) returns (.hbase.pb.MoveServersResponse);</code> + * <code>rpc MoveTables(.hbase.pb.MoveTablesRequest) returns (.hbase.pb.MoveTablesResponse);</code> + * <code>rpc RemoveGroup(.hbase.pb.RemoveGroupRequest) returns (.hbase.pb.RemoveGroupResponse);</code> + * <code>rpc BalanceGroup(.hbase.pb.BalanceGroupRequest) returns (.hbase.pb.BalanceGroupResponse);</code> + * <code>rpc ListGroupInfos(.hbase.pb.ListGroupInfosRequest) returns (.hbase.pb.ListGroupInfosResponse);</code> + * <code>rpc GetGroupInfo(.hbase.pb.GetGroupInfoRequest) returns (.hbase.pb.GetGroupInfoResponse);</code> + * <code>rpc GetGroupInfoOfTable(.hbase.pb.GetGroupInfoOfTableRequest) returns (.hbase.pb.GetGroupInfoOfTableResponse);</code> -1 site . The patch appears to cause mvn post-site goal to fail. -1 core tests . The patch failed these unit tests: -1 core zombie tests . There are 1 zombie test(s): Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15506//testReport/ Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15506//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15506//console This message is automatically generated.
          Hide
          eclark Elliott Clark added a comment -

          Commenting on RB now.

          Show
          eclark Elliott Clark added a comment - Commenting on RB now.
          Hide
          toffer Francis Liu added a comment -

          Addressed comments on RB.

          Show
          toffer Francis Liu added a comment - Addressed comments on RB.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12756192/HBASE-6721_14.patch
          against master branch at commit d2e338181800ae3cef55ddca491901b65259dc7f.
          ATTACHMENT ID: 12756192

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 27 new or modified tests.

          +1 hadoop versions. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 protoc. The applied patch does not increase the total number of protoc compiler warnings.

          -1 javadoc. The javadoc tool appears to have generated 6 warning messages.

          -1 checkstyle. The applied patch generated 1859 checkstyle errors (more than the master's current 1835 errors).

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 lineLengths. The patch introduces the following lines longer than 100:
          + * <code>rpc GetGroupInfo(.hbase.pb.GetGroupInfoRequest) returns (.hbase.pb.GetGroupInfoResponse);</code>
          + * <code>rpc GetGroupInfoOfTable(.hbase.pb.GetGroupInfoOfTableRequest) returns (.hbase.pb.GetGroupInfoOfTableResponse);</code>
          + * <code>rpc GetGroupInfoOfServer(.hbase.pb.GetGroupInfoOfServerRequest) returns (.hbase.pb.GetGroupInfoOfServerResponse);</code>
          + * <code>rpc MoveServers(.hbase.pb.MoveServersRequest) returns (.hbase.pb.MoveServersResponse);</code>
          + * <code>rpc MoveTables(.hbase.pb.MoveTablesRequest) returns (.hbase.pb.MoveTablesResponse);</code>
          + * <code>rpc RemoveGroup(.hbase.pb.RemoveGroupRequest) returns (.hbase.pb.RemoveGroupResponse);</code>
          + * <code>rpc BalanceGroup(.hbase.pb.BalanceGroupRequest) returns (.hbase.pb.BalanceGroupResponse);</code>
          + * <code>rpc ListGroupInfos(.hbase.pb.ListGroupInfosRequest) returns (.hbase.pb.ListGroupInfosResponse);</code>
          + * <code>rpc GetGroupInfo(.hbase.pb.GetGroupInfoRequest) returns (.hbase.pb.GetGroupInfoResponse);</code>
          + * <code>rpc GetGroupInfoOfTable(.hbase.pb.GetGroupInfoOfTableRequest) returns (.hbase.pb.GetGroupInfoOfTableResponse);</code>

          +1 site. The mvn post-site goal succeeds with this patch.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.client.TestAsyncProcess

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15618//testReport/
          Release Findbugs (version 2.0.3) warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15618//artifact/patchprocess/newFindbugsWarnings.html
          Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15618//artifact/patchprocess/checkstyle-aggregate.html

          Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15618//artifact/patchprocess/patchJavadocWarnings.txt
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15618//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12756192/HBASE-6721_14.patch against master branch at commit d2e338181800ae3cef55ddca491901b65259dc7f. ATTACHMENT ID: 12756192 +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 27 new or modified tests. +1 hadoop versions . The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1) +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 protoc . The applied patch does not increase the total number of protoc compiler warnings. -1 javadoc . The javadoc tool appears to have generated 6 warning messages. -1 checkstyle . The applied patch generated 1859 checkstyle errors (more than the master's current 1835 errors). +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 lineLengths . The patch introduces the following lines longer than 100: + * <code>rpc GetGroupInfo(.hbase.pb.GetGroupInfoRequest) returns (.hbase.pb.GetGroupInfoResponse);</code> + * <code>rpc GetGroupInfoOfTable(.hbase.pb.GetGroupInfoOfTableRequest) returns (.hbase.pb.GetGroupInfoOfTableResponse);</code> + * <code>rpc GetGroupInfoOfServer(.hbase.pb.GetGroupInfoOfServerRequest) returns (.hbase.pb.GetGroupInfoOfServerResponse);</code> + * <code>rpc MoveServers(.hbase.pb.MoveServersRequest) returns (.hbase.pb.MoveServersResponse);</code> + * <code>rpc MoveTables(.hbase.pb.MoveTablesRequest) returns (.hbase.pb.MoveTablesResponse);</code> + * <code>rpc RemoveGroup(.hbase.pb.RemoveGroupRequest) returns (.hbase.pb.RemoveGroupResponse);</code> + * <code>rpc BalanceGroup(.hbase.pb.BalanceGroupRequest) returns (.hbase.pb.BalanceGroupResponse);</code> + * <code>rpc ListGroupInfos(.hbase.pb.ListGroupInfosRequest) returns (.hbase.pb.ListGroupInfosResponse);</code> + * <code>rpc GetGroupInfo(.hbase.pb.GetGroupInfoRequest) returns (.hbase.pb.GetGroupInfoResponse);</code> + * <code>rpc GetGroupInfoOfTable(.hbase.pb.GetGroupInfoOfTableRequest) returns (.hbase.pb.GetGroupInfoOfTableResponse);</code> +1 site . The mvn post-site goal succeeds with this patch. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.client.TestAsyncProcess Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15618//testReport/ Release Findbugs (version 2.0.3) warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15618//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15618//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15618//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15618//console This message is automatically generated.
          Hide
          apurtell Andrew Purtell added a comment - - edited

          Andrew Purtell The new patch is based off of current master. Should we just replace the current branch with a new branch to go with the new patch?

          I would say so if you want it.

          Also was wondering since this is now cp-based. Do we still need a branch?

          Up to you. We can rebase or delete, either way let me know.

          Some issues with the current security integration. Coprocessors can't call into the internals of other coprocessors. I understand why this was done, but we can't have it. Coprocessors calling into the internals of other coprocessors, this is a non-negotiable point for the sake of sanity in maintenance of separate optional extensions. It's a catch-22 imposed on this change by the requirement it be a coprocessor only implementation.

          What I would suggest is introduce into the MasterObserver API hooks for the group admin APIs. Let the implementation of the group admin APIs and the authoritative security decisions both be separate mix-ins provided by different coprocessors. There needs to be common plumbing for the two. That belongs in MasterObserver. The plumbing could look like:

          • MasterObserver support for pre/post group admin API action hooks
          • In GroupAdminEndpoint, get the coprocessor host with getMasterCoprocessorHost()
          • Invoke the public (technically, LimitedPrivate(COPROC)) APIs for pre/post group admin API actions.
          • AccessController implements the new MasterObserver APIs to provide security for the group admin APIs.

          This is much more in spirit with current interfaces and audience scoping. It decouples GroupAdminEndpoint from AccessController. (If the AC is not installed, no harm, no NPEs, no security checking (by intention), it's all good.) It also addresses concerns about zero impact in the default case. Those upcalls will never be made unless the GroupAdminEndpoint is installed.

          Show
          apurtell Andrew Purtell added a comment - - edited Andrew Purtell The new patch is based off of current master. Should we just replace the current branch with a new branch to go with the new patch? I would say so if you want it. Also was wondering since this is now cp-based. Do we still need a branch? Up to you. We can rebase or delete, either way let me know. Some issues with the current security integration. Coprocessors can't call into the internals of other coprocessors. I understand why this was done, but we can't have it. Coprocessors calling into the internals of other coprocessors, this is a non-negotiable point for the sake of sanity in maintenance of separate optional extensions. It's a catch-22 imposed on this change by the requirement it be a coprocessor only implementation. What I would suggest is introduce into the MasterObserver API hooks for the group admin APIs. Let the implementation of the group admin APIs and the authoritative security decisions both be separate mix-ins provided by different coprocessors. There needs to be common plumbing for the two. That belongs in MasterObserver. The plumbing could look like: MasterObserver support for pre/post group admin API action hooks In GroupAdminEndpoint, get the coprocessor host with getMasterCoprocessorHost() Invoke the public (technically, LimitedPrivate(COPROC)) APIs for pre/post group admin API actions. AccessController implements the new MasterObserver APIs to provide security for the group admin APIs. This is much more in spirit with current interfaces and audience scoping. It decouples GroupAdminEndpoint from AccessController. (If the AC is not installed, no harm, no NPEs, no security checking (by intention), it's all good.) It also addresses concerns about zero impact in the default case. Those upcalls will never be made unless the GroupAdminEndpoint is installed.
          Hide
          toffer Francis Liu added a comment -

          Addressed comments on RB.

          Andrew Purtell let me know if the changes I made for security is what you expected. I pretty much just ported the security/cp stuff from the non-cp patch.

          Show
          toffer Francis Liu added a comment - Addressed comments on RB. Andrew Purtell let me know if the changes I made for security is what you expected. I pretty much just ported the security/cp stuff from the non-cp patch.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12764454/HBASE-6721_15.patch
          against master branch at commit 24370547c500df0026a71944d8be88cd5b51b23e.
          ATTACHMENT ID: 12764454

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 33 new or modified tests.

          +1 hadoop versions. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 2.7.1)

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 protoc. The applied patch does not increase the total number of protoc compiler warnings.

          -1 javadoc. The javadoc tool appears to have generated 3 warning messages.

          -1 checkstyle. The applied patch generated 1811 checkstyle errors (more than the master's current 1781 errors).

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 lineLengths. The patch introduces the following lines longer than 100:
          + * <code>rpc GetGroupInfo(.hbase.pb.GetGroupInfoRequest) returns (.hbase.pb.GetGroupInfoResponse);</code>
          + * <code>rpc GetGroupInfoOfTable(.hbase.pb.GetGroupInfoOfTableRequest) returns (.hbase.pb.GetGroupInfoOfTableResponse);</code>
          + * <code>rpc GetGroupInfoOfServer(.hbase.pb.GetGroupInfoOfServerRequest) returns (.hbase.pb.GetGroupInfoOfServerResponse);</code>
          + * <code>rpc MoveServers(.hbase.pb.MoveServersRequest) returns (.hbase.pb.MoveServersResponse);</code>
          + * <code>rpc MoveTables(.hbase.pb.MoveTablesRequest) returns (.hbase.pb.MoveTablesResponse);</code>
          + * <code>rpc RemoveGroup(.hbase.pb.RemoveGroupRequest) returns (.hbase.pb.RemoveGroupResponse);</code>
          + * <code>rpc BalanceGroup(.hbase.pb.BalanceGroupRequest) returns (.hbase.pb.BalanceGroupResponse);</code>
          + * <code>rpc ListGroupInfos(.hbase.pb.ListGroupInfosRequest) returns (.hbase.pb.ListGroupInfosResponse);</code>
          + * <code>rpc GetGroupInfo(.hbase.pb.GetGroupInfoRequest) returns (.hbase.pb.GetGroupInfoResponse);</code>
          + * <code>rpc GetGroupInfoOfTable(.hbase.pb.GetGroupInfoOfTableRequest) returns (.hbase.pb.GetGroupInfoOfTableResponse);</code>

          +1 site. The mvn post-site goal succeeds with this patch.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.client.TestAsyncProcess

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15832//testReport/
          Release Findbugs (version 2.0.3) warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15832//artifact/patchprocess/newFindbugsWarnings.html
          Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15832//artifact/patchprocess/checkstyle-aggregate.html

          Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15832//artifact/patchprocess/patchJavadocWarnings.txt
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15832//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12764454/HBASE-6721_15.patch against master branch at commit 24370547c500df0026a71944d8be88cd5b51b23e. ATTACHMENT ID: 12764454 +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 33 new or modified tests. +1 hadoop versions . The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 2.7.1) +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 protoc . The applied patch does not increase the total number of protoc compiler warnings. -1 javadoc . The javadoc tool appears to have generated 3 warning messages. -1 checkstyle . The applied patch generated 1811 checkstyle errors (more than the master's current 1781 errors). +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 lineLengths . The patch introduces the following lines longer than 100: + * <code>rpc GetGroupInfo(.hbase.pb.GetGroupInfoRequest) returns (.hbase.pb.GetGroupInfoResponse);</code> + * <code>rpc GetGroupInfoOfTable(.hbase.pb.GetGroupInfoOfTableRequest) returns (.hbase.pb.GetGroupInfoOfTableResponse);</code> + * <code>rpc GetGroupInfoOfServer(.hbase.pb.GetGroupInfoOfServerRequest) returns (.hbase.pb.GetGroupInfoOfServerResponse);</code> + * <code>rpc MoveServers(.hbase.pb.MoveServersRequest) returns (.hbase.pb.MoveServersResponse);</code> + * <code>rpc MoveTables(.hbase.pb.MoveTablesRequest) returns (.hbase.pb.MoveTablesResponse);</code> + * <code>rpc RemoveGroup(.hbase.pb.RemoveGroupRequest) returns (.hbase.pb.RemoveGroupResponse);</code> + * <code>rpc BalanceGroup(.hbase.pb.BalanceGroupRequest) returns (.hbase.pb.BalanceGroupResponse);</code> + * <code>rpc ListGroupInfos(.hbase.pb.ListGroupInfosRequest) returns (.hbase.pb.ListGroupInfosResponse);</code> + * <code>rpc GetGroupInfo(.hbase.pb.GetGroupInfoRequest) returns (.hbase.pb.GetGroupInfoResponse);</code> + * <code>rpc GetGroupInfoOfTable(.hbase.pb.GetGroupInfoOfTableRequest) returns (.hbase.pb.GetGroupInfoOfTableResponse);</code> +1 site . The mvn post-site goal succeeds with this patch. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.client.TestAsyncProcess Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15832//testReport/ Release Findbugs (version 2.0.3) warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15832//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15832//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15832//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15832//console This message is automatically generated.
          Hide
          enis Enis Soztutar added a comment -

          I backported the current v15 patch for testing it with 1.1 code base. Just parking it here in case somebody else needs it. I'll get back to the review soon.

          Show
          enis Enis Soztutar added a comment - I backported the current v15 patch for testing it with 1.1 code base. Just parking it here in case somebody else needs it. I'll get back to the review soon.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12768882/hbase-6721-v15-branch-1.1.patch
          against branch-1.1 branch at commit 496d20cfca5a30bc72a29e4ef893424964f9fa91.
          ATTACHMENT ID: 12768882

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 33 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/16242//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12768882/hbase-6721-v15-branch-1.1.patch against branch-1.1 branch at commit 496d20cfca5a30bc72a29e4ef893424964f9fa91. ATTACHMENT ID: 12768882 +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 33 new or modified tests. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/16242//console This message is automatically generated.
          Hide
          enis Enis Soztutar added a comment -

          Finally got around testing the v15 patch on 1.1 code base with a 7 node cluster. Here are my test notes. Nothing too concerning, but we have to address some of these in the patch. This is the configuration to add to enable groups:

              <property>
                <name>hbase.coprocessor.master.classes</name>
                <value>org.apache.hadoop.hbase.group.GroupAdminEndpoint</value>
              </property>
              <property>
                <name>hbase.master.loadbalancer.class</name>
                <value>org.apache.hadoop.hbase.group.GroupBasedLoadBalancer</value>
              </property>
          

          1. Need to add this diff, so that new PB files get compiled with -Pcompile-protobuf command:

          diff --git hbase-protocol/pom.xml hbase-protocol/pom.xml
          index 8034576..d352373 100644
          --- hbase-protocol/pom.xml
          +++ hbase-protocol/pom.xml
          @@ -180,6 +180,8 @@
                                     <include>ErrorHandling.proto</include>
                                     <include>Filter.proto</include>
                                     <include>FS.proto</include>
          +                          <include>Group.proto</include>
          +                          <include>GroupAdmin.proto</include>
                                     <include>HBase.proto</include>
                                     <include>HFile.proto</include>
                                     <include>LoadBalancer.proto</include>
          

          2. NPE in group shell commands with nonexisting groups:

          hbase(main):015:0* balance_group 'nonexisting' 
          
          ERROR: java.io.IOException
          	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
          	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
          	at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
          	at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
          	at java.lang.Thread.run(Thread.java:745)
          Caused by: java.lang.NullPointerException
          	at org.apache.hadoop.hbase.group.GroupAdminServer.groupGetRegionsInTransition(GroupAdminServer.java:412)
          	at org.apache.hadoop.hbase.group.GroupAdminServer.balanceGroup(GroupAdminServer.java:348)
          	at org.apache.hadoop.hbase.group.GroupAdminEndpoint.balanceGroup(GroupAdminEndpoint.java:229)
          	at org.apache.hadoop.hbase.protobuf.generated.GroupAdminProtos$GroupAdminService.callMethod(GroupAdminProtos.java:11156)
          	at org.apache.hadoop.hbase.master.MasterRpcServices.execMasterService(MasterRpcServices.java:666)
          	at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:51121)
          	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
          

          and

          hbase(main):030:0> get_group 'nonexisting'
          GROUP INFORMATION                                                                                                                                                                                                                                                  
          Servers:                                                                                                                                                                                                                                                           
          
          ERROR: undefined method `getServers' for nil:NilClass
          
          Here is some help for this command:
          Get a region server group's information.
          
          Example:
          
            hbase> get_group 'default'
          

          and

          hbase(main):077:0* move_group_tables 'nonexisting'
          
          ERROR: undefined method `each' for nil:NilClass
          
          Here is some help for this command:
          Reassign tables from one group to another.
          
            hbase> move_group_tables 'dest',['table1','table2']
          

          and

          hbase(main):173:0* move_group_servers 'nonexisting'
          
          ERROR: undefined method `each' for nil:NilClass
          
          Here is some help for this command:
          Reassign a region server from one group to another.
          
            hbase> move_group_servers 'dest',['server1:port','server2:port']
          

          3. Group names should be restricted to alphanumeric only. This one is pretty easy, but important. This following caused the master to abort, and the master cannot restart after this point (without manually removing the rsgroup entry from the table which you cannot do without master). I had to nuke the hdfs and zk to start over.

          hbase(main):033:0> add_group 'a-/:*'
          
          ERROR: java.io.IOException: Failed to write to groupZNode
          	at org.apache.hadoop.hbase.group.GroupInfoManagerImpl.flushConfig(GroupInfoManagerImpl.java:419)
          	at org.apache.hadoop.hbase.group.GroupInfoManagerImpl.addGroup(GroupInfoManagerImpl.java:152)
          	at org.apache.hadoop.hbase.group.GroupAdminServer.addGroup(GroupAdminServer.java:298)
          	at org.apache.hadoop.hbase.group.GroupAdminEndpoint.addGroup(GroupAdminEndpoint.java:197)
          	at org.apache.hadoop.hbase.protobuf.generated.GroupAdminProtos$GroupAdminService.callMethod(GroupAdminProtos.java:11146)
          	at org.apache.hadoop.hbase.master.MasterRpcServices.execMasterService(MasterRpcServices.java:666)
          	at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:51121)
          	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
          	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
          	at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
          	at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
          	at java.lang.Thread.run(Thread.java:745)
          Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase-unsecure/groupInfo/a-/:*
          	at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
          	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
          	at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
          	at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:575)
          	at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:554)
          	at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:1261)
          	at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:1250)
          	at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:1233)
          	at org.apache.hadoop.hbase.group.GroupInfoManagerImpl.flushConfig(GroupInfoManagerImpl.java:408)
          

          4. get_table_group and get_server_group shell commands do not work

          hbase(main):019:0* get_table_group 'nonexisting'
          
          ERROR: undefined local variable or method `s' for #<Hbase::GroupAdmin:0x64518270>
          
          Here is some help for this command:
          Get the group name the given table is a member of.
          
            hbase> get_table_group 'myTable'
          
           
          hbase(main):022:0* get_server_group 'server'
          
          ERROR: undefined local variable or method `s' for #<Hbase::GroupAdmin:0x64518270>
          
          Here is some help for this command:
          Get the group name the given region server is a member of.
          
            hbase> get_server_group 'server1:port1
          

          5. move_group_servers and move_group_tables arguments are listed as 1, although should be 2:

          hbase(main):033:0* move_group_servers 
          
          ERROR: wrong number of arguments (0 for 1)
          
          Here is some help for this command:
          Reassign a region server from one group to another.
          
            hbase> move_group_servers 'dest',['server1:port','server2:port']
          

          6. Adding a server without port throws error, but no explanation (this one is a minor, not that important).

          hbase(main):070:0> move_group_servers 'group2', ['os-enis-hbase-oct27-a-3.novalocal']  
          
          ERROR: 
          
          Here is some help for this command:
          Reassign a region server from one group to another.
          
            hbase> move_group_servers 'dest',['server1:port','server2:port']
          

          7. From all the above, it is clear that we need a unit test over the new shell commands.

          Other than these, the feature is working as expected. Defining groups, moving servers and tables work. Regions get reassigned according to their groups. Restarting the cluster keeps assignments, etc.

          Some more findings:
          Test 1:
          Killed the last regionserver of a group, all 15 regions are in FAILED_OPEN state.

          • restarted the master, regions still in FAILED_OPEN state (which is expected)
          • Added a new server to the group which had no remaining servers, regions still in FAILED_OPEN state (this is probably due to how assignment works, we give up after 10 retries and wait for manual assignment or master restart)
          • Started the region server that was killed before, still in FAILED_OPEN
          • Master restart reassigned these regions.

          Test 2:
          Tried to move all servers to a single group. Correctly handles last server in the default group by not allowing it to change.

          Test 3:
          Killed the last server in the default group, while all system tables are in the default group (and hence in that server).
          -> hbase:meta was always in PENDING_OPEN in bogus server localhost,1,1.
          -> Upon restarting the killed server, meta and other tables in the default group (including rsgroup table) got reassigned.
          As a side note, having not enough servers in the group that has the meta or rsgroup table seems like a very good way of shoothing yourself in the foot. However, as discussed before this maybe needed for strong isolation.

          • Add non-existing server to the group. Is not allowed.
          • Checked JMX
          • Adding group information for tables and regionserver to the master UI would be helpful. We can leave this to a follow up.
          • Obviously there should be a follow up to add at least some basic documentation on how to enable and configure and use RS groups in the book.
          Show
          enis Enis Soztutar added a comment