Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
0.90.0, 0.92.0
-
None
-
Reviewed
Description
The Progressable used on region open to tickle the ZK OPENING node to prevent the master from timing out a region open operation will currently abort the RegionServer if this fails for some reason. However it could be "normal" for an RS to have a region open operation aborted by the master, so should just handle as it does other places by reverting the open.
We had a cluster trip over some other issue (for some reason, the tickle was not happening in < 30 seconds, so master was timing out every time). Because of the abort on BadVersion, this eventually led to every single RS aborting itself eventually taking down the cluster.