XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.1.0, 2.0.1
    • Fix Version/s: 3.0.0-alpha-1, 2.2.0, 2.1.1, 2.0.3
    • Component/s: amv2
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Find this one while investigating HBASE-20921. After the root procedure(ModifyTableProcedure in this case) rolled back, a ArrayIndexOutOfBoundsException was thrown

      2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): CODE-BUG: Uncaught runtime exception for pid=5973, state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo
      interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l
      ang.NullPointerException; ModifyTableProcedure table=IntegrationTestBigLinkedList
      java.lang.UnsupportedOperationException: unhandled state=MODIFY_TABLE_REOPEN_ALL_REGIONS
              at org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147)
              at org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50)
              at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
              at org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864)
              at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353)
              at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
              at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
              at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
              at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
      2018-07-18 01:39:10,243 WARN  [PEWorker-8] procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null
      java.lang.ArrayIndexOutOfBoundsException: 1
              at org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405)
              at org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178)
              at org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513)
              at org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505)
              at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741)
              at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691)
              at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603)
              at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387)
              at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
              at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
              at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
              at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
      

      This is a very serious condition, After this exception thrown, the exclusive lock held by ModifyTableProcedure was never released. All the procedure against this table were blocked. Until the master restarted, and since the lock info for the procedure won't be restored, the other procedures can go again, it is quite embarrassing that a bug save us...(this bug will be fixed in HBASE-20846)

      I tried to reproduce this one using the test case in HBASE-20921 but I just can't reproduce it.
      A easy way to resolve this is add a try catch, making sure no matter what happens, the table's exclusive lock can always be relased.

        Attachments

        1. HBASE-20973.branch-2.0.001.patch
          1 kB
          Allan Yang
        2. HBASE-20973.branch-2.0.002.patch
          2 kB
          Allan Yang
        3. HBASE-20973.patch
          7 kB
          Duo Zhang

          Activity

            People

            • Assignee:
              zhangduo Duo Zhang
              Reporter:
              allan163 Allan Yang
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: