[HBASE-13792] Regionserver unable to report to master when master is restarted - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Duplicate
Affects Version/s: 2.0.0
Fix Version/s: 2.0.0
Component/s: IPC/RPC
Labels:
None
Environment:

x86_64 GNU/Linux

Description

I was testing master branch on distributed cluster and i notice that when master is restarted on running cluster regionservers are unable report back when master is up again.
Things back to normal after i restarted regionservers. Logs showing that regionservers are correctly detecting master znode.
After some digging i notice that we have changed client implementation in RpcClientFactory to AsyncRpcClient so i have tried running cluster with previous RpcClientImpl and issue was gone.
So issue is probably caused by AsyncRpcClient which is unable reconnect to master once original connection is gone.
I was able to fix issue by creating new rpcClient object inside HRegionServer#createRegionServerStatusStub() and using it for channel creation here is diff:

diff --git a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
index fa56966..27e658c 100644
--- a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
+++ b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
@@ -2219,8 +2219,11 @@ public class HRegionServer extends HasThread implements
           break;
         }
         try {
+          LOG.info("***Creating new client connection");
+          rpcClient = RpcClientFactory.createClient(conf, clusterId, new InetSocketAddress(
+            rpcServices.isa.getAddress(), 0));
           BlockingRpcChannel channel =
-            this.rpcClient.createBlockingRpcChannel(sn, userProvider.getCurrent(),
+          rpcClient.createBlockingRpcChannel(sn, userProvider.getCurrent(),
               shortOperationTimeout);
           intf = RegionServerStatusService.newBlockingStub(channel);
           break;

If this is acceptable way for fixing this issue i will create and attach patch?

Attachments

Issue Links

is duplicated by

HBASE-13793 Regionserver unable to report to master when master is restarted

Closed

is related to

HBASE-13337 Table regions are not assigning back, after restarting all regionservers at once.

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Samir Ahmic

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 28/May/15 12:22

Updated:: 24/Jun/22 17:40

Resolved:: 06/Jul/15 12:50