Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-4470

ServerNotRunningException coming out of assignRootAndMeta kills the Master

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 0.90.4
    • 0.92.2, 0.94.1, 0.95.0
    • None
    • None
    • Reviewed

    Description

      I'm surprised we still have issues like that and I didn't get a hit while googling so forgive me if there's already a jira about it.

      When the master starts it verifies the locations of root and meta before assigning them, if the server is started but not running you'll get this:

      2011-09-23 04:47:44,859 WARN org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: RemoteException connecting to RS
      org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hbase.ipc.ServerNotRunningException: Server is not running yet
      at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1038)

      at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:771)
      at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
      at $Proxy6.getProtocolVersion(Unknown Source)
      at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
      at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
      at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
      at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
      at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:969)
      at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:388)
      at org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:287)
      at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:484)
      at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:441)
      at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:388)
      at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:282)

      I hit that 3-4 times this week while debugging something else. The worst is that when you restart the master it sees that as a failover, but none of the regions are assigned so it takes an eternity to get back fully online.

      Attachments

        1. HBASE-4470-90.patch
          5 kB
          Gregory Chanan
        2. HBASE-4470-v2-90.patch
          6 kB
          Gregory Chanan
        3. HBASE-4470-v2-92_94.patch
          3 kB
          Gregory Chanan
        4. HBASE-4470-v2-trunk.patch
          3 kB
          Gregory Chanan

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            gchanan Gregory Chanan
            jdcryans Jean-Daniel Cryans
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment