When the Phoenix system tables become corrupt recovery is a painstaking process of low level examination of table contents and manipulation of same with the HBase shell. This is very difficult work providing no margin of safety, and is a critical gap in terms of usability.
At the OS level, we have fsck.
At the HDFS level, we have fsck (integrity checking only, though)
At the HBase level, we have hbck.
At the Phoenix level, we lack a system table repair tool.
Implement a tool that:
- Does not depend on the Phoenix client.
- Supports integrity checking of SYSTEM tables. Check for the existence of all required columns in entries. Check that entries exist for all Phoenix managed tables (implies Phoenix should add supporting advisory-only metadata to the HBase table schemas). Check that serializations are valid.
- Supports complete repair of SYSTEM.CATALOG and recreation, if necessary, of other tables like SYSTEM.STATS which can be dropped to recover from an emergency. We should be able to drop SYSTEM.CATALOG (or any other SYSTEM table), run the tool, and have a completely correct recreation of SYSTEM.CATALOG available at the end of its execution.
- To the extent we have or introduce cross-system-table invariants, check them and offer a repair or reconstruction option.