Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-13792

Regionserver unable to report to master when master is restarted

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Duplicate
    • 2.0.0
    • 2.0.0
    • IPC/RPC
    • None
    • x86_64 GNU/Linux

    Description

      I was testing master branch on distributed cluster and i notice that when master is restarted on running cluster regionservers are unable report back when master is up again.
      Things back to normal after i restarted regionservers. Logs showing that regionservers are correctly detecting master znode.
      After some digging i notice that we have changed client implementation in RpcClientFactory to AsyncRpcClient so i have tried running cluster with previous RpcClientImpl and issue was gone.
      So issue is probably caused by AsyncRpcClient which is unable reconnect to master once original connection is gone.
      I was able to fix issue by creating new rpcClient object inside HRegionServer#createRegionServerStatusStub() and using it for channel creation here is diff:

      diff --git a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
      index fa56966..27e658c 100644
      --- a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
      +++ b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
      @@ -2219,8 +2219,11 @@ public class HRegionServer extends HasThread implements
                 break;
               }
               try {
      +          LOG.info("***Creating new client connection");
      +          rpcClient = RpcClientFactory.createClient(conf, clusterId, new InetSocketAddress(
      +            rpcServices.isa.getAddress(), 0));
                 BlockingRpcChannel channel =
      -            this.rpcClient.createBlockingRpcChannel(sn, userProvider.getCurrent(),
      +          rpcClient.createBlockingRpcChannel(sn, userProvider.getCurrent(),
                     shortOperationTimeout);
                 intf = RegionServerStatusService.newBlockingStub(channel);
                 break;
      

      If this is acceptable way for fixing this issue i will create and attach patch?

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              asamir Samir Ahmic
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: