Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-20644

Master shutdown due to service ClusterSchemaServiceImpl failing to start

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 2.0.0
    • None
    • None
    • None

    Description

      From hbase-hbase-master-ctr-e138-1518143905142-329221-01-000003.hwx.site.log :

      2018-05-23 22:14:29,750 ERROR [master/ctr-e138-1518143905142-329221-01-000003:20000] master.HMaster: Failed to become active master
      java.lang.IllegalStateException: Expected the service ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED
              at org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:345)
              at org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:291)
              at org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1054)
              at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:918)
              at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2023)
      

      Earlier in the log , the namespace region, 01a7f9ba9fffd691f261d3fbc620da06 , was deemed OPEN on 01-000007.hwx.site,16020,1527112194788 which was declared not online:

      2018-05-23 21:54:34,786 INFO  [master/ctr-e138-1518143905142-329221-01-000003:20000] assignment.RegionStateStore: Load hbase:meta entry                                         region=01a7f9ba9fffd691f261d3fbc620da06, regionState=OPEN, lastHost=ctr-e138-1518143905142-329221-01-000007.hwx.site,16020,1527112194788, regionLocation=ctr-e138-1518143905142-329221-01-000007.hwx.site,16020,1527112194788, seqnum=43
      2018-05-23 21:54:34,787 INFO  [master/ctr-e138-1518143905142-329221-01-000003:20000] assignment.AssignmentManager: Number of RegionServers=1
      2018-05-23 21:54:34,788 INFO  [master/ctr-e138-1518143905142-329221-01-000003:20000] assignment.AssignmentManager: KILL RegionServer=ctr-e138-1518143905142-329221-01-000007.   hwx.site,16020,1527112194788 hosting regions but not online.
      

      Later, even though a different instance on 007 registered with master:

      2018-05-23 21:55:13,541 INFO  [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=20000] master.ServerManager: Registering regionserver=ctr-e138-1518143905142-329221-01-000007.hwx.site,16020,1527112506002
      ...
      2018-05-23 21:55:43,881 INFO  [master/ctr-e138-1518143905142-329221-01-000003:20000] client.RpcRetryingCallerImpl: Call exception, tries=12, retries=12, started=69001 ms ago,            cancelled=false, msg=org.apache.hadoop.hbase.NotServingRegionException: hbase:namespace,,1527099443383.01a7f9ba9fffd691f261d3fbc620da06. is not online on ctr-e138-1518143905142-329221-  01-000007.hwx.site,16020,1527112506002
        at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3273)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3250)
        at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1414)
        at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2446)
        at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
      

      There was no OPEN request for 01a7f9ba9fffd691f261d3fbc620da06 sent to that server instance.

      From hbase-hbase-regionserver-ctr-e138-1518143905142-329221-01-000007.hwx.site.log :

      2018-05-23 21:52:27,414 INFO  [RS_CLOSE_REGION-regionserver/ctr-e138-1518143905142-329221-01-000007:16020-1] regionserver.HRegion: Closed hbase:namespace,,1527099443383.       01a7f9ba9fffd691f261d3fbc620da06.
      

      Then region server 007 restarted:

      Wed May 23 21:55:03 UTC 2018 Starting regionserver on ctr-e138-1518143905142-329221-01-000007.hwx.site
      

      After which the region 01a7f9ba9fffd691f261d3fbc620da06 never showed up again in log 007

      Attachments

        Activity

          People

            Unassigned Unassigned
            romil.choksi Romil Choksi
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: