Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-26149

Further improvements on ConnectionRegistry implementations



    • Umbrella
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Client
    • None


      (Copied in-line from the attached 'Documentation' with some filler as connecting script)

      HBASE-23324 Deprecate clients that connect to Zookeeper

      ^^^ This is always our goal, to remove the zookeeper dependency from the client side.


      See the sub-task HBASE-25051 DIGEST based auth broken for MasterRegistry

      When constructing RpcClient, we will pass the clusterid in, and it will be used to select the authentication method. More specifically, it will be used to select the tokens for digest based authentication, please see the code in BuiltInProviderSelector. For ZKConnectionRegistry, we do not need to use RpcClient to connect to zookeeper, so we could get the cluster id first, and then create the RpcClient. But for MasterRegistry/RpcConnectionRegistry, we need to use RpcClient to connect to the ClientMetaService endpoints and then we can call the getClusterId method to get the cluster id. Because of this, when creating RpcClient for MasterRegistry/RpcConnectionRegistry, we can only pass null or the default cluster id, which means the digest based authentication is broken.

      This is a cyclic dependency problem. Maybe a possible way forward, is to make getClusterId method available to all users, which means it does not require any authentication, so we can always call getClusterId with simple authentication, and then at client side, once we get the cluster id, we create a new RpcClient to select the correct authentication way.

      The work in the sub-task, HBASE-26150 Let region server also carry ClientMetaService, is work to make it so the RegionServers can carry a ConnectionRegistry (rather than have the Masters-only carry it as is the case now). Adds a new method getBootstrapNodes to ClientMetaService, the ConnectionRegistry proto Service, for refreshing the bootstrap nodes periodically or on error. The new RpcConnectionRegistry  [Created here but defined in the next sub-task]will use this method to refresh the bootstrap nodes, while the old MasterRegistry will use the getMasters method to refresh the ‘bootstrap’ nodes.

      The getBootstrapNodes method will return all the region servers, so after the first refreshing, the client will go to region servers for later rpc calls. But since masters and region servers both implement the ClientMetaService interface, it is free for the client to configure master as the initial bootstrap nodes.

      The following sub-task then deprecates MasterRegistry, HBASE-26172 Deprecated MasterRegistry

      The implementation of MasterRegistry is almost the same with RpcConnectionRegistry except that it uses getMasters instead of getBootstrapNodes to refresh the ‘bootstrap’ nodes connected to. So we could add configs in server side to control what nodes we want to return to client in getBootstrapNodes, i.e, master or region server, then the RpcConnectionRegistry can fully replace the old MasterRegistry. Deprecates the MasterRegistry.

      Sub-task HBASE-26173 Return only a sub set of region servers as bootstrap nodes

      For a large cluster which may have thousands of region servers, it is not a good idea to return all the region servers as bootstrap nodes to clients. So we should add a config at server side to control the max number of bootstrap nodes we want to return to clients. I think the default value could be 5 or 10, which is enough.

      Sub-task HBASE-26174 Make rpc connection registry the default registry on 3.0.0

      Just a follow up of HBASE-26172. MasterRegistry has been deprecated, we should not make it default for 3.0.0 any more.

      Sub-task HBASE-26180 Introduce a initial refresh interval for RpcConnectionRegistry

      As end users could configure any nodes in a cluster as the initial bootstrap nodes, it is possible that different end users will configure the same machine which makes the machine over load. So we should have a shorter delay for the initial refresh, to let users quickly switch to the bootstrap nodes we want them to connect to.

      Sub-task HBASE-26181 Region server and master could use itself as ConnectionRegistry

      This is an optimization to reduce the pressure on zookeeper. For MasterRegistry, we do not want to use it as the ConnectionRegistry for our cluster connection because:

          // We use ZKConnectionRegistry for all the internal communication, primarily for these reasons:

          // - Decouples RS and master life cycles. RegionServers can continue be up independent of

          //   masters' availability.

          // - Configuration management for region servers (cluster internal) is much simpler when adding

          //   new masters or removing existing masters, since only clients' config needs to be updated.

          // - We need to retain ZKConnectionRegistry for replication use anyway, so we just extend it for

          //   other internal connections too.

      The above comments are in our code, in the HRegionServer.cleanupConfiguration method.

      But since now, masters and regionservers both implement the ClientMetaService interface, we are free to just let the ConnectionRegistry to make use of these in memory information directly, instead of going to zookeeper again.

      Sub-task HBASE-26182 Allow disabling refresh of connection registry endpoint

      One possible deployment in production is to use something like a lvs in front of all the region servers to act as a LB, so clients just need to connect to the lvs IP instead of going to the region server directly to get registry information.

      For this scenario we do not need to refresh the endpoints any more.

      The simplest way is to set the refresh interval to -1.


        Issue Links



              Unassigned Unassigned
              zhangduo Duo Zhang
              0 Vote for this issue
              8 Start watching this issue