Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-13317

Region server reportForDuty stuck looping if there is a master change

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.0, 0.98.12, 2.0.0
    • Fix Version/s: 1.0.1, 1.1.0, 0.98.12, 2.0.0
    • Component/s: regionserver
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      During cluster startup, region server reportForDuty gets stuck looping if there is a master change.

      2015-03-22 11:15:16,186 INFO  [regionserver60020] regionserver.HRegionServer: reportForDuty to master=bigaperf274,60000,1427045883965 with port=60020, startcode=1427048115174
      2015-03-22 11:15:16,272 WARN  [regionserver60020] regionserver.HRegionServer: error telling master we are up
      com.google.protobuf.ServiceException: java.net.ConnectException: Connection refused
      	at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1678)
      	at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
      	at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:8277)
      	at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2137)
      	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:896)
      	at java.lang.Thread.run(Thread.java:745)
      2015-03-22 11:15:16,274 WARN  [regionserver60020] regionserver.HRegionServer: reportForDuty failed; sleeping and then retrying.
      2015-03-22 11:15:19,274 INFO  [regionserver60020] regionserver.HRegionServer: reportForDuty to master=bigaperf273,60000,1427048108439 with port=60020, startcode=1427048115174
      2015-03-22 11:15:19,275 WARN  [regionserver60020] regionserver.HRegionServer: error telling master we are up
      com.google.protobuf.ServiceException: java.net.ConnectException: Connection refused
      	at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1678)
      	at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
      	at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:8277)
      	at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2137)
      	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:896)
      	at java.lang.Thread.run(Thread.java:745)
      2015-03-22 11:15:19,276 WARN  [regionserver60020] regionserver.HRegionServer: reportForDuty failed; sleeping and then retrying.
      2015-03-22 11:15:22,276 INFO  [regionserver60020] regionserver.HRegionServer: reportForDuty to master=bigaperf273,60000,1427048108439 with port=60020, startcode=1427048115174
      2015-03-22 11:15:22,296 DEBUG [regionserver60020] regionserver.HRegionServer: Master is not running yet
      2015-03-22 11:15:22,296 WARN  [regionserver60020] regionserver.HRegionServer: reportForDuty failed; sleeping and then retrying.
      2015-03-22 11:15:25,296 INFO  [regionserver60020] regionserver.HRegionServer: reportForDuty to master=bigaperf273,60000,1427048108439 with port=60020, startcode=1427048115174
      2015-03-22 11:15:25,299 DEBUG [regionserver60020] regionserver.HRegionServer: Master is not running yet
      2015-03-22 11:15:25,299 WARN  [regionserver60020] regionserver.HRegionServer: reportForDuty failed; sleeping and then retrying.
      2015-03-22 11:15:28,299 INFO  [regionserver60020] regionserver.HRegionServer: reportForDuty to master=bigaperf273,60000,1427048108439 with port=60020, startcode=1427048115174
      2015-03-22 11:15:28,302 DEBUG [regionserver60020] regionserver.HRegionServer: Master is not running yet
      2015-03-22 11:15:28,302 WARN  [regionserver60020] regionserver.HRegionServer: reportForDuty failed; sleeping and then retrying.
      

      What happended is the region server first got master=bigaperf274,60000,1427045883965. Before it was able to report successfully, the maser changed to bigaperf273,60000,1427048108439.
      We were supposed to open a new connection to the new master. But we never did, looping and trying to old address forever.

        Attachments

        1. addendum-for-branch-1.patch
          2 kB
          Jerry He
        2. HBASE-13317-branch-1-v5.patch
          10 kB
          Jerry He
        3. HBASE-13317-master-v5.patch
          10 kB
          Jerry He
        4. HBASE-13317-0.98-v5.patch
          10 kB
          Jerry He
        5. HBASE-13317-0.98-v4.patch
          10 kB
          Jerry He
        6. HBASE-13317-0.98-v3.patch
          10 kB
          Jerry He
        7. HBASE-13317-0.98-v2.patch
          1 kB
          Jerry He
        8. HBASE-13317-0.98.patch
          1.0 kB
          Jerry He

          Issue Links

            Activity

              People

              • Assignee:
                jinghe Jerry He
                Reporter:
                jinghe Jerry He
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: