Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
Reviewed
-
Description
If the masterprocwals have been removed – operator error, hdfs dataloss, or because we have gotten ourselves into a pathological state where we have hundreds of masterprocwals too process and it is taking too long so we just want to startover – then master startup will have a dilemma. Master startup needs hbase:meta to be online. If the masterprocwals have been removed, there may be no outstanding assign or a servercrashprocedure with coverage for hbase:meta (I ran into this issue repeatedly in internal testing purging masterprocwals on a large test cluster). Worse, when master startup cannot find an online hbase:meta, it exits after exhausting the RPC retries.
So, we need a holding-pattern for master startup if hbase:meta is not online if only so an operator can schedule an assign for meta or so they can assign fixup procedures (HBASE-21035 has discussion on why we cannot just auto-schedule an assign of meta).