HBase
  1. HBase
  2. HBASE-3946

The splitted region can be online again while the standby hmaster becomes the active one

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.90.3
    • Fix Version/s: 0.90.4
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      (The cluster has two HMatser, one active and one standby)

      1.While the active HMaster shutdown, the standby one would become the active one, and went into the processFailover() method:
      if (regionCount == 0)

      { LOG.info("Master startup proceeding: cluster startup"); this.assignmentManager.cleanoutUnassigned(); this.assignmentManager.assignAllUserRegions(); }

      else

      { LOG.info("Master startup proceeding: master failover"); this.assignmentManager.processFailover(); }

      2.After that, the user regions would be rebuild.
      Map<HServerInfo,List<Pair<HRegionInfo,Result>>> deadServers = rebuildUserRegions();

      3.Here's how the rebuildUserRegions worked. All the regions(contain the splitted regions) would be added to the offlineRegions of offlineServers.

      for (Result result : results) {
      Pair<HRegionInfo,HServerInfo> region =
      MetaReader.metaRowToRegionPairWithInfo(result);
      if (region == null) continue;
      HServerInfo regionLocation = region.getSecond();
      HRegionInfo regionInfo = region.getFirst();
      if (regionLocation == null)

      { // Region not being served, add to region map with no assignment // If this needs to be assigned out, it will also be in ZK as RIT this.regions.put(regionInfo, null); }

      else if (!serverManager.isServerOnline(
      regionLocation.getServerName())) {
      // Region is located on a server that isn't online
      List<Pair<HRegionInfo,Result>> offlineRegions =
      offlineServers.get(regionLocation);
      if (offlineRegions == null)

      { offlineRegions = new ArrayList<Pair<HRegionInfo,Result>>(1); offlineServers.put(regionLocation, offlineRegions); }

      offlineRegions.add(new Pair<HRegionInfo,Result>(regionInfo, result));
      } else

      { // Region is being served and on an active server regions.put(regionInfo, regionLocation); addToServers(regionLocation, regionInfo); }

      }

      4.It seems that all the offline regions will be added to RIT and online again:
      ZKAssign will creat node for each offline never consider the splitted ones.

      AssignmentManager# processDeadServers
      private void processDeadServers(
      Map<HServerInfo, List<Pair<HRegionInfo, Result>>> deadServers)
      throws IOException, KeeperException {
      for (Map.Entry<HServerInfo, List<Pair<HRegionInfo,Result>>> deadServer :
      deadServers.entrySet()) {
      List<Pair<HRegionInfo,Result>> regions = deadServer.getValue();
      for (Pair<HRegionInfo,Result> region : regions) {
      HRegionInfo regionInfo = region.getFirst();
      Result result = region.getSecond();
      // If region was in transition (was in zk) force it offline for reassign
      try

      { ZKAssign.createOrForceNodeOffline(watcher, regionInfo, master.getServerName()); }

      catch (KeeperException.NoNodeException nne)

      { // This is fine }

      // Process with existing RS shutdown code
      ServerShutdownHandler.processDeadRegion(regionInfo, result, this,
      this.catalogTracker);
      }
      }
      }

      AssignmentManager# processFailover
      // Process list of dead servers
      processDeadServers(deadServers);
      // Check existing regions in transition
      List<String> nodes = ZKUtil.listChildrenAndWatchForNewChildren(watcher,
      watcher.assignmentZNode);
      if (nodes.isEmpty())

      { LOG.info("No regions in transition in ZK to process on failover"); return; }

      LOG.info("Failed-over master needs to process " + nodes.size() +
      " regions in transition");
      for (String encodedRegionName: nodes)

      { processRegionInTransition(encodedRegionName, null); }

      So I think before add the region into RIT, check it at first.

      1. HBASE-3946.patch
        1 kB
        Jieshan Bean
      2. HBASE-3946-V2.patch
        1 kB
        Jieshan Bean

        Activity

        Hide
        Jieshan Bean added a comment -

        For the patch I haven't take enough test on it. But I can describe my solution:

        Index: src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
        ===================================================================
        --- src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java	(revision 1130364)
        +++ src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java	(working copy)
        @@ -1470,14 +1470,17 @@
                 Result result = region.getSecond();
                 // If region was in transition (was in zk) force it offline for reassign
                 try {
        -          ZKAssign.createOrForceNodeOffline(watcher, regionInfo,
        -              master.getServerName());
        +          //Process with existing RS shutdown code  
        +          boolean  isNotDisabledAndSplitted = 
        +            ServerShutdownHandler.processDeadRegion(regionInfo, result, this,
        +              this.catalogTracker);    
        +          if (isNotDisabledAndSplitted)  {
        +            ZKAssign.createOrForceNodeOffline(watcher, regionInfo,
        +              master.getServerName()); 
        +          }
                 } catch (KeeperException.NoNodeException nne) {
                   // This is fine
                 }
        -        // Process with existing RS shutdown code
        -        ServerShutdownHandler.processDeadRegion(regionInfo, result, this,
        -            this.catalogTracker);
               }
             }
           }
        
        Show
        Jieshan Bean added a comment - For the patch I haven't take enough test on it. But I can describe my solution: Index: src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java =================================================================== --- src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java (revision 1130364) +++ src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java (working copy) @@ -1470,14 +1470,17 @@ Result result = region.getSecond(); // If region was in transition (was in zk) force it offline for reassign try { - ZKAssign.createOrForceNodeOffline(watcher, regionInfo, - master.getServerName()); + //Process with existing RS shutdown code + boolean isNotDisabledAndSplitted = + ServerShutdownHandler.processDeadRegion(regionInfo, result, this, + this.catalogTracker); + if (isNotDisabledAndSplitted) { + ZKAssign.createOrForceNodeOffline(watcher, regionInfo, + master.getServerName()); + } } catch (KeeperException.NoNodeException nne) { // This is fine } - // Process with existing RS shutdown code - ServerShutdownHandler.processDeadRegion(regionInfo, result, this, - this.catalogTracker); } } }
        Hide
        Jieshan Bean added a comment -

        I have taken some tests on this patch, and it indeed works.
        Stack, can you give out your suggestions on this patch? Thanks!

        Show
        Jieshan Bean added a comment - I have taken some tests on this patch, and it indeed works. Stack, can you give out your suggestions on this patch? Thanks!
        Hide
        Jieshan Bean added a comment -

        Sorry for the prev patch name is wrong.
        This patch just modified two places of the code format.

        Show
        Jieshan Bean added a comment - Sorry for the prev patch name is wrong. This patch just modified two places of the code format.
        Hide
        stack added a comment -

        Committed to branch and trunk. Thanks for the patch Jieshan.

        Show
        stack added a comment - Committed to branch and trunk. Thanks for the patch Jieshan.
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #1976 (See https://builds.apache.org/job/HBase-TRUNK/1976/)

        Show
        Hudson added a comment - Integrated in HBase-TRUNK #1976 (See https://builds.apache.org/job/HBase-TRUNK/1976/ )
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #1995 (See https://builds.apache.org/job/HBase-TRUNK/1995/)

        Show
        Hudson added a comment - Integrated in HBase-TRUNK #1995 (See https://builds.apache.org/job/HBase-TRUNK/1995/ )

          People

          • Assignee:
            Jieshan Bean
            Reporter:
            Jieshan Bean
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development