Details
Description
HBaseFsck#checkRegionConsistency() checks region consistency and repair the corruption if requested. However, this function expects some exceptions. For example, in one aspect of region repair, it calls HBaseFsckRepair#waitUntilAssigned(), if a region is in transition for over 120 seconds (default value of "hbase.hbck.assign.timeout" configuration), IOException would throw.
The problem is that one exception in checkRegionConsistency() would kill entire hbck operation, because the exception would propagate up.
The proposal is that if the region is not META region ( or a system table region if we prefer), we can skip the region if HBaseFsck#checkRegionConsistency() fails. We could print out skip regions in summary section so that users know to either re-run or investigate potential issue for that region.