Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Implemented
-
0.94.23
-
None
-
None
Description
If we get an IOE (we currently ignore it) while regions are being undeployed by hbck we should make sure that we don't re-assign that region in the master before we know that RS was marked as dead and optionally let the user to confirm that action or we will end in a split brain situation with clients talking to different RSs serving the same region.
The offending part is here in HBaseFsck.undeployRegions():
private void undeployRegions(HbckInfo hi) throws IOException, InterruptedException { for (OnlineEntry rse : hi.deployedEntries) { LOG.debug("Undeploy region " + rse.hri + " from " + rse.hsa); try { HBaseFsckRepair.closeRegionSilentlyAndWait(admin, rse.hsa, rse.hri); offline(rse.hri.getRegionName()); } catch (IOException ioe) { LOG.warn("Got exception when attempting to offline region " + Bytes.toString(rse.hri.getRegionName()), ioe); } } }