HBase
  1. HBase
  2. HBASE-5829

Inconsistency between the "regions" map and the "servers" map in AssignmentManager

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.90.6, 0.92.1
    • Fix Version/s: 0.95.0
    • Component/s: master
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      There are occurrences in AM where this.servers is not kept consistent with this.regions. This might cause balancer to offline a region from the RS that already returned NotServingRegionException at a previous offline attempt.

      In AssignmentManager.unassign(HRegionInfo, boolean)
      try {
      // TODO: We should consider making this look more like it does for the
      // region open where we catch all throwables and never abort
      if (serverManager.sendRegionClose(server, state.getRegion(),
      versionOfClosingNode))

      { LOG.debug("Sent CLOSE to " + server + " for region " + region.getRegionNameAsString()); return; }

      // This never happens. Currently regionserver close always return true.
      LOG.warn("Server " + server + " region CLOSE RPC returned false for " +
      region.getRegionNameAsString());
      } catch (NotServingRegionException nsre)

      { LOG.info("Server " + server + " returned " + nsre + " for " + region.getRegionNameAsString()); // Presume that master has stale data. Presume remote side just split. // Presume that the split message when it comes in will fix up the master's // in memory cluster state. }

      catch (Throwable t) {
      if (t instanceof RemoteException) {
      t = ((RemoteException)t).unwrapRemoteException();
      if (t instanceof NotServingRegionException) {
      if (checkIfRegionBelongsToDisabling(region)) {
      // Remove from the regionsinTransition map
      LOG.info("While trying to recover the table "
      + region.getTableNameAsString()
      + " to DISABLED state the region " + region
      + " was offlined but the table was in DISABLING state");
      synchronized (this.regionsInTransition)

      { this.regionsInTransition.remove(region.getEncodedName()); }

      // Remove from the regionsMap
      synchronized (this.regions)

      { this.regions.remove(region); }

      deleteClosingOrClosedNode(region);
      }
      }
      // RS is already processing this region, only need to update the timestamp
      if (t instanceof RegionAlreadyInTransitionException)

      { LOG.debug("update " + state + " the timestamp."); state.update(state.getState()); }

      }

      In AssignmentManager.assign(HRegionInfo, RegionState, boolean, boolean, boolean)
      synchronized (this.regions)

      { this.regions.put(plan.getRegionInfo(), plan.getDestination()); }
      1. HBASE-5829-0.90.patch
        0.7 kB
        Maryann Xue
      2. HBASE-5829-trunk.patch
        1.0 kB
        Maryann Xue

        Activity

        Hide
        stack added a comment -

        Marking closed.

        Show
        stack added a comment - Marking closed.
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK-security #186 (See https://builds.apache.org/job/HBase-TRUNK-security/186/)
        HBASE-5829 Inconsistency between the "regions" map and the "servers" map in AssignmentManager (Revision 1330993)

        Result = SUCCESS
        stack :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK-security #186 (See https://builds.apache.org/job/HBase-TRUNK-security/186/ ) HBASE-5829 Inconsistency between the "regions" map and the "servers" map in AssignmentManager (Revision 1330993) Result = SUCCESS stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
        Hide
        stack added a comment -

        Applied to trunk. Letting patch hang out in case someone wants to apply it to other branches.

        I added you as a contributor Maryann and assigned you this issue (You can assign yourself issues going forward). Thanks for the patch.

        Show
        stack added a comment - Applied to trunk. Letting patch hang out in case someone wants to apply it to other branches. I added you as a contributor Maryann and assigned you this issue (You can assign yourself issues going forward). Thanks for the patch.
        Hide
        Ted Yu added a comment -

        The latest patch is good to go.
        Useless statement can be addressed elsewhere.

        Show
        Ted Yu added a comment - The latest patch is good to go. Useless statement can be addressed elsewhere.
        Hide
        stack added a comment -

        @Ted Make a new issue?

        Show
        stack added a comment - @Ted Make a new issue?
        Hide
        Ted Yu added a comment -

        Patch makes sense.
        w.r.t. this.servers, I found a useless statement (at least in trunk):

          void unassignCatalogRegions() {
            this.servers.entrySet();
        

        that should be removed.

        Show
        Ted Yu added a comment - Patch makes sense. w.r.t. this.servers, I found a useless statement (at least in trunk): void unassignCatalogRegions() { this .servers.entrySet(); that should be removed.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12524120/HBASE-5829-trunk.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.TestRegionRebalancing

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1643//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1643//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1643//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12524120/HBASE-5829-trunk.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.TestRegionRebalancing Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1643//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1643//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1643//console This message is automatically generated.
        Hide
        Maryann Xue added a comment -

        @ for the second, think we should guarantee that it is also added to the map "this.servers".

        Show
        Maryann Xue added a comment - @ for the second, think we should guarantee that it is also added to the map "this.servers".
        Hide
        Maryann Xue added a comment -

        Add corresponding operations to this.servers

        Show
        Maryann Xue added a comment - Add corresponding operations to this.servers
        Hide
        stack added a comment -

        Do you have a patch for us Maryann? The first at least seems legit (For the second, there is no associated server, right?)

        Show
        stack added a comment - Do you have a patch for us Maryann? The first at least seems legit (For the second, there is no associated server, right?)
        Hide
        Maryann Xue added a comment -

        In AssignmentManager.unassign(HRegionInfo, boolean)
        // Remove from the regionsMap
        synchronized (this.regions)

        { this.regions.remove(region); }

        In AssignmentManager.assign(HRegionInfo, RegionState, boolean, boolean, boolean)
        synchronized (this.regions)

        { this.regions.put(plan.getRegionInfo(), plan.getDestination()); }

        Here, not updating/removing the region from this.servers might cause the balancer to generate incorrect region plans.
        After the fix of HBASE-5563, it seems this problem won't cause endless loop of wrong balances or a region always in transition.

        Show
        Maryann Xue added a comment - In AssignmentManager.unassign(HRegionInfo, boolean) // Remove from the regionsMap synchronized (this.regions) { this.regions.remove(region); } In AssignmentManager.assign(HRegionInfo, RegionState, boolean, boolean, boolean) synchronized (this.regions) { this.regions.put(plan.getRegionInfo(), plan.getDestination()); } Here, not updating/removing the region from this.servers might cause the balancer to generate incorrect region plans. After the fix of HBASE-5563 , it seems this problem won't cause endless loop of wrong balances or a region always in transition.
        Hide
        stack added a comment -

        Please explain where the disparity between this.server and this.regions is in in the code Maryann.

        Show
        stack added a comment - Please explain where the disparity between this.server and this.regions is in in the code Maryann.

          People

          • Assignee:
            Maryann Xue
            Reporter:
            Maryann Xue
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development