Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-23904

Procedure updating meta and Master shutdown are incompatible: CODE-BUG

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.0.0-alpha-1, 2.3.0
    • Component/s: amv2
    • Labels:
      None

      Description

      Chasing flakies, studying TestMasterAbortWhileMergingTable, I noticed a failure because

      2020-02-27 00:57:51,702 ERROR [PEWorker-6] procedure2.ProcedureExecutor(1688): CODE-BUG: Uncaught runtime exception: pid=14, state=RUNNABLE:MERGE_TABLE_REGIONS_UPDATE_META, locked=true; MergeTableRegionsProcedure table=test, regions=[48c9be922fa4356bfc7fc61b5b0785f3, ef196d5377c5c1d143e9a2a2ea056a9c], force=false
      java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@28b956c7 rejected from java.util.concurrent.ThreadPoolExecutor@639f20e5[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 5]
              at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
              at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
              at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
              at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
              at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:974)
              at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:953)
              at org.apache.hadoop.hbase.MetaTableAccessor.multiMutate(MetaTableAccessor.java:1771)
              at org.apache.hadoop.hbase.MetaTableAccessor.mergeRegions(MetaTableAccessor.java:1637)
              at org.apache.hadoop.hbase.master.assignment.RegionStateStore.mergeRegions(RegionStateStore.java:268)
              at org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsMerged(AssignmentManager.java:1854)
              at org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.updateMetaForMergedRegions(MergeTableRegionsProcedure.java:687)
              at org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:229)
              at org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:77)
              at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:194)
              at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:962)
              at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1669)
              at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1416)
              at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:79)
              at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1986)
       

      A few seconds above, as part of the test, we'd stopped Master

      
      2020-02-27 00:57:51,620 INFO  [Time-limited test] regionserver.HRegionServer(2212): ***** STOPPING region server 'rn-hbased-lapp01.rno.exampl.com,36587,1582765058324' *****
      2020-02-27 00:57:51,620 INFO  [Time-limited test] regionserver.HRegionServer(2226): STOPPED: Stopping master 0 

      The rejected execution damages the merge procedure. It shows as an unhandled CODE-BUG.

      Why we let a runtime exception out when trying to update meta is mildly interesting. We use Throwables.propagateIfPossible(e, IOException.class) from guava which at first blush would seem to throw the exception if it an IOE else return. In code, if return, we'll wrap whatever makes it through with an IOE.  But propagateIfPossible is a little sneaky in that if the passed Exception is a RuntimeException, as the Reject is, it will go ahead and throw and NOT return.  Not sure if this was authors' understanding (Duo Zhang  ? HBASE-21789 for hbase-2.2.0). Looking at the old code, which called makeIOExceptionOfException from ProtobufUtil, if I read it right, this would wrap the exception in an IOE regardless whether a RuntimeException or not.

      A little digging exposes that likely root of the problem is that the Master is stopping. Its connection, which is used by the merge procedure when updating meta, is being shutdown too. The rejected exception is probably because the pool has been shutdown. Hard to tell for sure as Master doesn't log the minutae of services closed.

      The propagateIfPossible facility is used in a few places. Its addition to MetaTableAccessor is in one place only by HBASE-21789. I could restore the old behavior easy enough (Was afraid we had to deal with this issue around ALL meta table accesses via MTA).

       

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                stack Michael Stack
                Reporter:
                stack Michael Stack
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: