Details
-
Sub-task
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
2.1.0, 2.0.2
-
None
-
None
Description
After restore the procedures form Procedure WALs. We will put the runable procedures back to the queue to execute. The order is not the problem before HBASE-20846 since the first one to execute will acquire the lock itself. But since the locks will restored after HBASE-20846. If we execute a procedure without the lock first before a procedure with the lock in the same queue, there is a race condition that we may not be able to execute all procedures in the same queue at all.
The race condtion is:
1. A procedure need to take the table's exclusive lock was put into the table's queue, but the table's shard lock was lock by a Region Procedure. Since no one takes the exclusive lock, the queue is put to run queue to execute. But soon, the worker thread see the procedure can't execute because it doesn't hold the lock, so it will stop execute and remove the queue from run queue.
2. At the same time, the Region procedure which holds the table's shard lock and the region's exclusive lock is put to the table's queue. But, since the queue already added to the run queue, it won't add again.
3. Since 1, the table's queue was removed from the run queue.
4. Then, no one will put the table's queue back, thus no worker will execute the procedures inside
A test case in the patch shows how.
Attachments
Attachments
Issue Links
- is related to
-
HBASE-21376 Add some verbose log to MasterProcedureScheduler
- Resolved
- relates to
-
HBASE-21375 Revisit the lock and queue implementation in MasterProcedureScheduler
- Resolved