Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-3946

The splitted region can be online again while the standby hmaster becomes the active one


    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.90.3
    • Fix Version/s: 0.90.4
    • Component/s: None
    • Labels:
    • Hadoop Flags:


      (The cluster has two HMatser, one active and one standby)

      1.While the active HMaster shutdown, the standby one would become the active one, and went into the processFailover() method:
      if (regionCount == 0)

      { LOG.info("Master startup proceeding: cluster startup"); this.assignmentManager.cleanoutUnassigned(); this.assignmentManager.assignAllUserRegions(); }


      { LOG.info("Master startup proceeding: master failover"); this.assignmentManager.processFailover(); }

      2.After that, the user regions would be rebuild.
      Map<HServerInfo,List<Pair<HRegionInfo,Result>>> deadServers = rebuildUserRegions();

      3.Here's how the rebuildUserRegions worked. All the regions(contain the splitted regions) would be added to the offlineRegions of offlineServers.

      for (Result result : results) {
      Pair<HRegionInfo,HServerInfo> region =
      if (region == null) continue;
      HServerInfo regionLocation = region.getSecond();
      HRegionInfo regionInfo = region.getFirst();
      if (regionLocation == null)

      { // Region not being served, add to region map with no assignment // If this needs to be assigned out, it will also be in ZK as RIT this.regions.put(regionInfo, null); }

      else if (!serverManager.isServerOnline(
      regionLocation.getServerName())) {
      // Region is located on a server that isn't online
      List<Pair<HRegionInfo,Result>> offlineRegions =
      if (offlineRegions == null)

      { offlineRegions = new ArrayList<Pair<HRegionInfo,Result>>(1); offlineServers.put(regionLocation, offlineRegions); }

      offlineRegions.add(new Pair<HRegionInfo,Result>(regionInfo, result));
      } else

      { // Region is being served and on an active server regions.put(regionInfo, regionLocation); addToServers(regionLocation, regionInfo); }


      4.It seems that all the offline regions will be added to RIT and online again:
      ZKAssign will creat node for each offline never consider the splitted ones.

      AssignmentManager# processDeadServers
      private void processDeadServers(
      Map<HServerInfo, List<Pair<HRegionInfo, Result>>> deadServers)
      throws IOException, KeeperException {
      for (Map.Entry<HServerInfo, List<Pair<HRegionInfo,Result>>> deadServer :
      deadServers.entrySet()) {
      List<Pair<HRegionInfo,Result>> regions = deadServer.getValue();
      for (Pair<HRegionInfo,Result> region : regions) {
      HRegionInfo regionInfo = region.getFirst();
      Result result = region.getSecond();
      // If region was in transition (was in zk) force it offline for reassign

      { ZKAssign.createOrForceNodeOffline(watcher, regionInfo, master.getServerName()); }

      catch (KeeperException.NoNodeException nne)

      { // This is fine }

      // Process with existing RS shutdown code
      ServerShutdownHandler.processDeadRegion(regionInfo, result, this,

      AssignmentManager# processFailover
      // Process list of dead servers
      // Check existing regions in transition
      List<String> nodes = ZKUtil.listChildrenAndWatchForNewChildren(watcher,
      if (nodes.isEmpty())

      { LOG.info("No regions in transition in ZK to process on failover"); return; }

      LOG.info("Failed-over master needs to process " + nodes.size() +
      " regions in transition");
      for (String encodedRegionName: nodes)

      { processRegionInTransition(encodedRegionName, null); }

      So I think before add the region into RIT, check it at first.


        1. HBASE-3946.patch
          1 kB
          Jieshan Bean
        2. HBASE-3946-V2.patch
          1 kB
          Jieshan Bean



            • Assignee:
              jeason Jieshan Bean
              jeason Jieshan Bean
            • Votes:
              0 Vote for this issue
              3 Start watching this issue


              • Created: