Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.1.0, 2.0.1
-
None
-
Reviewed
Description
After HBASE-20846, we restore lock info for procedures. But, there is a case that the lock and be held by a already success procedure. Since the procedure won't execute again, the lock will held by the procedure forever.
1. All children for pid=1208 had been finished, but before procedure 1208 awake, the master was killed
2018-08-05 02:20:14,465 INFO [PEWorker-8] procedure2.ProcedureExecutor(1659): Finished subprocedure(s) of pid=1208, ppid=1206, state=RUNNABLE, hasLock=true; MoveRegionProcedure hri=c2a23a735f16df57299 dba6fd4599f2f, source=e010125050127.bja,60020,1533403109034, destination=e010125050127.bja,60020,1533403109034; resume parent processing. 2018-08-05 02:20:14,466 INFO [PEWorker-8] procedure2.ProcedureExecutor(1296): Finished pid=1232, ppid=1208, state=SUCCESS, hasLock=false; AssignProcedure table=IntegrationTestBigLinkedList, region=c2a 23a735f16df57299dba6fd4599f2f, target=e010125050127.bja,60020,1533403109034 in 1.5060sec
2. Master restarts, since procedure 1208 held the lock before restart, so the lock was resotore for it
2018-08-05 02:20:30,803 DEBUG [Thread-15] procedure2.ProcedureExecutor(456): Loading pid=1208, ppid=1206, state=SUCCESS, hasLock=false; MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, source= e010125050127.bja,60020,1533403109034, destination=e010125050127.bja,60020,1533403109034 2018-08-05 02:20:30,818 DEBUG [Thread-15] procedure2.Procedure(898): pid=1208, ppid=1206, state=SUCCESS, hasLock=false; MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, source=e010125050127.bj a,60020,1533403109034, destination=e010125050127.bja,60020,1533403109034 held the lock before restarting, call acquireLock to restore it. 2018-08-05 02:20:30,818 INFO [Thread-15] procedure.MasterProcedureScheduler(631): pid=1208, ppid=1206, state=SUCCESS, hasLock=false; MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, source=e0 10125050127.bja,60020,1533403109034, destination=e010125050127.bja,60020,1533403109034 checking lock on c2a23a735f16df57299dba6fd4599f2f
3. Since procedure 1208 is success, it won't execute later, so the lock will be held by it forever
We need to check the state of the procedure before restoring locks, if the procedure is already finished (success or rollback), we do not need to acquire lock for it.