Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-21624

master startup should not wait (or die) on assigning meta replicas

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: amv2, meta
    • Labels:
      None

      Description

      Due to some other bug, a meta replica is stuck in transition forever.
      Master is running fine without it, however the initializer thread hasn't finished initialization for ~19 hours now and is stuck in the below state.
      Doesn't seem to be necessary to wait for them - could just be fire-and-forget, normal region handling should handle it after that.

      Thread 118 (master/...:17000:becomeActiveMaster):
        State: TIMED_WAITING
        Blocked count: 281
        Waited count: 67059
        Stack:
          java.lang.Thread.sleep(Native Method)
          org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitFor(ProcedureSyncWait.java:209)
          org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitFor(ProcedureSyncWait.java:192)
          org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitForProcedureToComplete(ProcedureSyncWait.java:151)
          org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitForProcedureToCompleteIOE(ProcedureSyncWait.java:140)
          org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.submitAndWaitProcedure(ProcedureSyncWait.java:133)
          org.apache.hadoop.hbase.master.assignment.AssignmentManager.assign(AssignmentManager.java:569)
          org.apache.hadoop.hbase.master.MasterMetaBootstrap.assignMetaReplicas(MasterMetaBootstrap.java:84)
          org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1146)
          org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2342)
      

      Additionally and semi related, if the meta-hosting server dies during replica assignment, master also immediately dies, which is unnecessary.

      2018-12-14 21:00:55,331 ERROR [master/...:17000:becomeActiveMaster] master.HMaster: Failed to become active master
      org.apache.hadoop.hbase.HBaseIOException: rit=OFFLINE, location=null, table=hbase:meta, region=534574363 is currently in transition
                      at org.apache.hadoop.hbase.master.assignment.AssignmentManager.preTransitCheck(AssignmentManager.java:545)
                      at org.apache.hadoop.hbase.master.assignment.AssignmentManager.assign(AssignmentManager.java:563)
                      at org.apache.hadoop.hbase.master.MasterMetaBootstrap.assignMetaReplicas(MasterMetaBootstrap.java:84)
                      at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1146)
                      at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2342)
                      at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:591)
      

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              sershe Sergey Shelukhin
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated: