Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
0.90.3
-
None
-
None
-
Added better error messages for regions that are offline or split parents
Description
I just witnessed a job having failed tasks because of RegionOfflineException. This should normally happen because the table is disabled, but this can also happen because the parent is offline. Probably 99.999% of the time users don't hit it because SplitTransaction is able to offline the parent and add the first daughter quickly enough, but in my case the cluster was so slow that I was able to see.
Maybe we should check in HCM not only if the region is offline but also if it's split, in which case we should retry?