Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-19937

Ensure createRSGroupTable be called after ProcedureExecutor and LoadBalancer are initialized

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0-beta-2
    • 2.0.0-beta-2, 1.4.2, 2.0.0
    • rsgroup
    • None
    • Reviewed

    Description

      hbase:rsgroup table will be created by calling createRSGroupTable when master load system coprocessors in 

       

      844  this.cpHost = new MasterCoprocessorHost(this, this.conf);

      when ProcedureExecutor hasn't been initialized before createRSGroupTable, it will encounter Exception as follows,

       

      Exception in thread "org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker-localhost,49715,1518088607130" java.lang.IllegalArgumentException
      at org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
      at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.submitProcedure(ProcedureExecutor.java:847)
      at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.submitProcedure(ProcedureExecutor.java:835)
      at org.apache.hadoop.hbase.master.HMaster.createSystemTable(HMaster.java:1795)
      at org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker.createRSGroupTable(RSGroupInfoManagerImpl.java:858)
      at org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker.waitForGroupTableOnline(RSGroupInfoManagerImpl.java:823)
      at org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker.run(RSGroupInfoManagerImpl.java:743)

      while ProcedureExecutor  initialized by calling

      848  startServiceThreads();

      And LoadBalancer is initialized by calling 

      868  this.balancer.initialize();

      When LoadBalancer hasn't been initialized before createRSGroupTable, it will encounters Exception as follows,

      2018-02-02,16:12:45,688 ERROR org.apache.hadoop.hbase.procedure2.ProcedureExecutor: CODE-BUG: Uncaught runtime exception: pid=7, state=RUNNABLE:CREATE_TABLE_ASSIGN_REGIONS; CreateTableProcedure table=hbase:rsgroup
      java.lang.NullPointerException
      at org.apache.hadoop.hbase.rsgroup.RSGroupBasedLoadBalancer.generateGroupMaps(RSGroupBasedLoadBalancer.java:254)
      at org.apache.hadoop.hbase.rsgroup.RSGroupBasedLoadBalancer.roundRobinAssignment(RSGroupBasedLoadBalancer.java:162)
      at org.apache.hadoop.hbase.master.assignment.AssignmentManager.createRoundRobinAssignProcedures(AssignmentManager.java:603)
      at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.executeFromState(CreateTableProcedure.java:108)
      at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.executeFromState(CreateTableProcedure.java:51)
      at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:182)
      at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845)
      at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1458)
      at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1227)
      at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78)
      at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1738)

       

      As a result of CreateTableProcedure.rollbackState, it may then print logs warning TableExistsException as follows,

      2018-02-02,16:12:55,503 WARN org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker: Failed to perform check
      java.io.IOException: Failed to create group table. org.apache.hadoop.hbase.TableExistsException: hbase:rsgroup
      at org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker.createRSGroupTable(RSGroupInfoManagerImpl.java:877)

       

      After some auto-retries, it loops running the thread RSGroupStartupWorker, will print logs as follows, 

      2018-02-02,16:23:17,626 INFO org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker: RSGroup table=hbase:rsgroup isOnline=true, regionCount=0, assignCount=0, rootMetaFound=true
      2018-02-02,16:23:17,730 INFO org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker: RSGroup table=hbase:rsgroup isOnline=true, regionCount=0, assignCount=0, rootMetaFound=true
      2018-02-02,16:23:17,834 INFO org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker: RSGroup table=hbase:rsgroup isOnline=true, regionCount=0, assignCount=0, rootMetaFound=true
      2018-02-02,16:23:17,937 INFO org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker: RSGroup table=hbase:rsgroup isOnline=true, regionCount=0, assignCount=0, rootMetaFound=true

       

      And using shells of rsgroup, it will tips that currently is in "offline mode".

       

      The reason of this problem is that the order of createRSGroupTable and initializing of ProcedureExecutor and LoadBalancer is out of control. If the former is excuted earlier, it will encounter Exception mentioned before.

       

      Attachments

        1. import-order.png
          17 kB
          Xiaolin Ha
        2. HBASE-19937.branch-2.006.patch
          7 kB
          Xiaolin Ha
        3. HBASE-19937.branch-2.005.patch
          7 kB
          Xiaolin Ha
        4. HBASE-19937.branch-2.004.patch
          7 kB
          Xiaolin Ha
        5. HBASE-19937.branch-2.003.patch
          6 kB
          Xiaolin Ha
        6. HBASE-19937.branch-2.002.patch
          3 kB
          Xiaolin Ha
        7. HBASE-19937.branch-2.001.patch
          3 kB
          Xiaolin Ha

        Issue Links

          Activity

            People

              Xiaolin Ha Xiaolin Ha
              Xiaolin Ha Xiaolin Ha
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: