Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-12131

[hbck] undeployRegions should handle gracefully network partitions and other exceptions to avoid the same region deployed multiple times

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Implemented
    • 0.94.23
    • None
    • hbck
    • None

    Description

      If we get an IOE (we currently ignore it) while regions are being undeployed by hbck we should make sure that we don't re-assign that region in the master before we know that RS was marked as dead and optionally let the user to confirm that action or we will end in a split brain situation with clients talking to different RSs serving the same region.

      The offending part is here in HBaseFsck.undeployRegions():

       private void undeployRegions(HbckInfo hi) throws IOException, InterruptedException {
          for (OnlineEntry rse : hi.deployedEntries) {
            LOG.debug("Undeploy region "  + rse.hri + " from " + rse.hsa);
            try {
              HBaseFsckRepair.closeRegionSilentlyAndWait(admin, rse.hsa, rse.hri);
              offline(rse.hri.getRegionName());
            } catch (IOException ioe) {
              LOG.warn("Got exception when attempting to offline region "
                  + Bytes.toString(rse.hri.getRegionName()), ioe);
            }
          }
        }
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            esteban Esteban Gutierrez
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: