Details
Description
There is a race condition inĀ TestProcedureSkipPersistence. After the procedure is added, the test stops ProcedureExecutor. In some cases the procedure is not added to the queue in time.
Failing execution:
2019-02-06 14:18:11,133 INFO [Time-limited test] procedure2.TimeoutExecutorThread(82): ADDED pid=-1, state=WAITING_TIMEOUT; org.apache.hadoop.hbase.procedure2.CompletedProcedureCleaner; timeout=30000, timestamp=1549491521133 2019-02-06 14:18:11,135 INFO [PEWorker-1] procedure2.TimeoutExecutorThread(82): ADDED pid=1, state=WAITING_TIMEOUT, locked=true; org.apache.hadoop.hbase.procedure2.TestProcedureSkipPersistence$TestProcedure; timeout=2000, timestamp=1549491493135 2019-02-06 14:18:11,137 INFO [Time-limited test] hbase.Waiter(189): Waiting up to [30,000] milli-secs(wait.for.ratio=[1]) 2019-02-06 14:18:11,139 INFO [Time-limited test] procedure2.ProcedureTestingUtility(125): RESTART - Stop 2019-02-06 14:18:11,139 INFO [Time-limited test] procedure2.ProcedureExecutor(702): Stopping 2019-02-06 14:18:11,139 INFO [Time-limited test] wal.WALProcedureStore(331): Stopping the WAL Procedure Store, isAbort=false 2019-02-06 14:18:11,140 DEBUG [PEWorker-1] procedure2.RootProcedureState(153): Add procedure pid=1, state=WAITING_TIMEOUT, locked=true; org.apache.hadoop.hbase.procedure2.TestProcedureSkipPersistence$TestProcedure as the 0th rollback step 2019-02-06 14:18:11,141 WARN [PEWorker-1] procedure2.ProcedureExecutor$WorkerThread(2074): Worker terminating UNNATURALLY null java.lang.RuntimeException: the store must be running before inserting data at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:710) at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.update(WALProcedureStore.java:603) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.updateStoreOnExec(ProcedureExecutor.java:1943) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1809) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1481) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1200(ProcedureExecutor.java:78) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:2058) 2019-02-06 14:18:11,145 INFO [Time-limited test] procedure2.ProcedureTestingUtility(137): RESTART - Start
In a successful run the ProcExecutor is stopped AFTER the procedure is actually in the queue.
Successful:
2019-02-07 15:48:08,731 INFO [Time-limited test] procedure2.TimeoutExecutorThread(82): ADDED pid=-1, state=WAITING_TIMEOUT; org.apache.hadoop.hbase.procedure2.CompletedProcedureCleaner; timeout=30000, timestamp=1549550918731 2019-02-07 15:48:08,731 INFO [PEWorker-1] procedure2.TimeoutExecutorThread(82): ADDED pid=1, state=WAITING_TIMEOUT, locked=true; org.apache.hadoop.hbase.procedure2.TestProcedureSkipPersistence$TestProcedure; timeout=2000, timestamp=1549550890731 2019-02-07 15:48:08,732 DEBUG [PEWorker-1] procedure2.RootProcedureState(153): Add procedure pid=1, state=WAITING_TIMEOUT, locked=true; org.apache.hadoop.hbase.procedure2.TestProcedureSkipPersistence$TestProcedure as the 0th rollback step 2019-02-07 15:48:08,732 INFO [WALProcedureStoreSyncThread] wal.WALProcedureStore(1217): Remove all state logs with ID less than 1, since all the active procedures are in the latest log 2019-02-07 15:48:08,733 DEBUG [WALProcedureStoreSyncThread] wal.WALProcedureStore(1239): Removed log=file:/Users/peter.somogyi/Cloudera/hbase/hbase-procedure/target/test-data/b9a1969a-85a4-15e8-7da5-6198f5acf2de/proc-logs/pv2-00000000000000000001.log, activeLogs=[/Users/peter.somogyi/Cloudera/hbase/hbase-procedure/target/test-data/b9a1969a-85a4-15e8-7da5-6198f5acf2de/proc-logs/pv2-00000000000000000002.log] 2019-02-07 15:48:08,734 INFO [Time-limited test] hbase.Waiter(189): Waiting up to [30,000] milli-secs(wait.for.ratio=[1]) 2019-02-07 15:48:08,734 INFO [Time-limited test] procedure2.ProcedureTestingUtility(125): RESTART - Stop 2019-02-07 15:48:08,734 INFO [Time-limited test] procedure2.ProcedureExecutor(702): Stopping 2019-02-07 15:48:08,735 INFO [Time-limited test] wal.WALProcedureStore(331): Stopping the WAL Procedure Store, isAbort=false 2019-02-07 15:48:08,735 INFO [Time-limited test] procedure2.ProcedureTestingUtility(137): RESTART - Start