Details
-
Sub-task
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
2.1.0
-
None
Description
We need this to effect repair when damage.
If procedure WALs AND a server WAL dir are lost or cleaned or we crashed during partial split (unlikely scenarios but nonetheless possible), a Master can be stuck unable to become active because there is no assign procedure for hbase:meta in the system.
The reasonable argument over in HBASE-21035 has it that attempts at auto-repair under these extremes could cause other issues so at least until we learn more, we for now punt to the operator for fix-up.
To reproduce the catastrophe, see notes in HBASE-21035 (and allan163's test).
UPDATE: HBASE-21191 adds a Master assuming an "holding-pattern" if on startup it does not have an assign for meta (possible if we lose all Master WAL Procs.). Holding pattern is needed because we were exiting after one minute of RPC'ing to old meta location. To inject an assign, the Admin#assign won't work because it gets rejected because the "Master is Initializing". So we need to be able to assign hbase:meta even if "Master is initializing". Also, while in here, add being able to bulk assign because assigning a Region-at-a-time from the shell only works if the offflined region count is in the low 10s; fails when thousands offline.
Attachments
Attachments
Issue Links
- is related to
-
HBASE-21035 Meta Table should be able to online even if all procedures are lost
- Resolved
- links to