Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.0.0-alpha-1
-
None
-
Reviewed
Description
When unit test hbase.master.procedure.TestServerCrashProcedure#testRecoveryAndDoubleExecutionOnRsWithMeta() is enabled and run several times, it fails intermittently. Cause is meta recovery is done at two different places:
- ServerCrashProcedure.processMeta()
- HMaster.finishActiveMasterInitialization()
and its not coordinated.
When HMaster.finishActiveMasterInitialization() gets to submit splitMetaLog() first and while its running call from ServerCrashProcedure.processMeta() fails causing step to be retried again in a loop.
When ServerCrashProcedure.processMeta() submits splitMetaLog after splitMetaLog from HMaster.finishActiveMasterInitialization() is finished, success is returned without doing any work.
But if ServerCrashProcedure.processMeta() submits splitMetaLog request and while its going HMaster.finishActiveMasterInitialization() submits it test fails with exception.
stack and I discussed the possible solution:
Create RecoverMetaProcedure and call it where required. Procedure framework provides mutual exclusion and requires idempotence, which should fix the problem.
Attachments
Attachments
Issue Links
- relates to
-
HBASE-18278 [AMv2] Enable and fix uni test hbase.master.procedure.TestServerCrashProcedure#testRecoveryAndDoubleExecutionOnRsWithMeta
- Resolved
-
HBASE-18366 Fix flaky test hbase.master.procedure.TestServerCrashProcedure#testRecoveryAndDoubleExecutionOnRsWithMeta
- Closed
- links to