Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.0.0, 0.98.12, 2.0.0
-
None
-
Reviewed
Description
During cluster startup, region server reportForDuty gets stuck looping if there is a master change.
2015-03-22 11:15:16,186 INFO [regionserver60020] regionserver.HRegionServer: reportForDuty to master=bigaperf274,60000,1427045883965 with port=60020, startcode=1427048115174 2015-03-22 11:15:16,272 WARN [regionserver60020] regionserver.HRegionServer: error telling master we are up com.google.protobuf.ServiceException: java.net.ConnectException: Connection refused at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1678) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719) at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:8277) at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2137) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:896) at java.lang.Thread.run(Thread.java:745) 2015-03-22 11:15:16,274 WARN [regionserver60020] regionserver.HRegionServer: reportForDuty failed; sleeping and then retrying. 2015-03-22 11:15:19,274 INFO [regionserver60020] regionserver.HRegionServer: reportForDuty to master=bigaperf273,60000,1427048108439 with port=60020, startcode=1427048115174 2015-03-22 11:15:19,275 WARN [regionserver60020] regionserver.HRegionServer: error telling master we are up com.google.protobuf.ServiceException: java.net.ConnectException: Connection refused at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1678) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719) at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:8277) at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2137) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:896) at java.lang.Thread.run(Thread.java:745) 2015-03-22 11:15:19,276 WARN [regionserver60020] regionserver.HRegionServer: reportForDuty failed; sleeping and then retrying. 2015-03-22 11:15:22,276 INFO [regionserver60020] regionserver.HRegionServer: reportForDuty to master=bigaperf273,60000,1427048108439 with port=60020, startcode=1427048115174 2015-03-22 11:15:22,296 DEBUG [regionserver60020] regionserver.HRegionServer: Master is not running yet 2015-03-22 11:15:22,296 WARN [regionserver60020] regionserver.HRegionServer: reportForDuty failed; sleeping and then retrying. 2015-03-22 11:15:25,296 INFO [regionserver60020] regionserver.HRegionServer: reportForDuty to master=bigaperf273,60000,1427048108439 with port=60020, startcode=1427048115174 2015-03-22 11:15:25,299 DEBUG [regionserver60020] regionserver.HRegionServer: Master is not running yet 2015-03-22 11:15:25,299 WARN [regionserver60020] regionserver.HRegionServer: reportForDuty failed; sleeping and then retrying. 2015-03-22 11:15:28,299 INFO [regionserver60020] regionserver.HRegionServer: reportForDuty to master=bigaperf273,60000,1427048108439 with port=60020, startcode=1427048115174 2015-03-22 11:15:28,302 DEBUG [regionserver60020] regionserver.HRegionServer: Master is not running yet 2015-03-22 11:15:28,302 WARN [regionserver60020] regionserver.HRegionServer: reportForDuty failed; sleeping and then retrying.
What happended is the region server first got master=bigaperf274,60000,1427045883965. Before it was able to report successfully, the maser changed to bigaperf273,60000,1427048108439.
We were supposed to open a new connection to the new master. But we never did, looping and trying to old address forever.
Attachments
Attachments
Issue Links
- duplicates
-
HBASE-12813 Reporting region in transition shouldn't loop forever
- Closed
- relates to
-
HBASE-13345 Fix LocalHBaseCluster so that different region server impl can be used for different slaves
- Closed