HBase
  1. HBase
  2. HBASE-2486

Add simple "anti-entropy" for region assignment

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.20.5
    • Fix Version/s: None
    • Component/s: master, regionserver
    • Release Note:
      Hide
      Adds new property "hbase.master.sanitychecking" which determines how master should handle situations where the master believes a region is hosted by a certain regionserver, but that regionserver indicates by throwing a 'No Such Region' exception, that it does not serve that region:

      lax - mark region as unassigned and continue
      paranoid - shut down master
      Show
      Adds new property "hbase.master.sanitychecking" which determines how master should handle situations where the master believes a region is hosted by a certain regionserver, but that regionserver indicates by throwing a 'No Such Region' exception, that it does not serve that region: lax - mark region as unassigned and continue paranoid - shut down master

      Description

      We've seen a number of bugs where a region server thinks it should not be serving a region, but the master and META think it should be. I'd like to propose a very simple way of fixing this issue:

      1) whenever a regionserver throws a NotServingRegionException, it also marks that region id in an RS-wide Set
      2) when a region sends a heartbeat, include a message for each of these regions, MSG_REPORT_NSRE or somesuch, and then clear the set
      3) when the master receives MSG_REPORT_NSRE, it does the following checks:
      a) if the region is assigned elsewhere according to META, the NSRE was due to a stale client, ignore
      b) if the region is in transition, ignore
      c) otherwise, we have an inconsistency, and we should take some steps to resolve (eg mark the region unassigned, or exit the master if we are in "paranoid mode")

      Whatever we do, we need to make sure that this is loudly logged, and causes unit tests to fail, when it's detected. This should not happen, but when it does, it would be good to recover without addtable.rb, etc.

      1. hbase2486.diff
        11 kB
        Eugene Koontz
      2. hbase2486.diff
        12 kB
        Eugene Koontz

        Issue Links

          Activity

          Hide
          stack added a comment -

          Moving out of 0.92.0. Pull it back in if you think different.

          Show
          stack added a comment - Moving out of 0.92.0. Pull it back in if you think different.
          Hide
          stack added a comment -

          Cancelling stale patch.

          Show
          stack added a comment - Cancelling stale patch.
          Hide
          stack added a comment -

          We need this but I think the urgency has had some of the air let out of it. Punting to 0.92 for now. Pull back in if the new bugs that master rewrite has introduced require this described mechanism.

          Show
          stack added a comment - We need this but I think the urgency has had some of the air let out of it. Punting to 0.92 for now. Pull back in if the new bugs that master rewrite has introduced require this described mechanism.
          Hide
          Jonathan Gray added a comment -

          Do we still need this now that we have the master rewrite and HBCK tool? Maybe punt to 0.92?

          Show
          Jonathan Gray added a comment - Do we still need this now that we have the master rewrite and HBCK tool? Maybe punt to 0.92?
          Hide
          Eugene Koontz added a comment -

          HBASE-2516 says that region server should return NotServingRegionException (NSRE) if region is marked as closed.

          How should the master verify consistency if the master receives a NSRE message for such a region?

          Show
          Eugene Koontz added a comment - HBASE-2516 says that region server should return NotServingRegionException (NSRE) if region is marked as closed. How should the master verify consistency if the master receives a NSRE message for such a region?
          Hide
          Eugene Koontz added a comment -

          HBASE-1502 would eliminate the need for heartbeat messages from regionservers to the master. Since HBASE-2486 fix relies on encoding NSRE region information in these heartbeat messages (an example of "piggybacking information on them when it can" (from HBASE-1502's description)), HBASE-2486 will need to be re-implemented as a zookeeper-related activity rather than a heartbeat-encoded one.

          Show
          Eugene Koontz added a comment - HBASE-1502 would eliminate the need for heartbeat messages from regionservers to the master. Since HBASE-2486 fix relies on encoding NSRE region information in these heartbeat messages (an example of "piggybacking information on them when it can" (from HBASE-1502 's description)), HBASE-2486 will need to be re-implemented as a zookeeper-related activity rather than a heartbeat-encoded one.
          Hide
          Eugene Koontz added a comment -

          Thanks, Stack, I will look at that test.

          Show
          Eugene Koontz added a comment - Thanks, Stack, I will look at that test.
          Hide
          stack added a comment -

          .bq Stack, ....I am hoping to write my unit tests in mockito after learning it - are there some examples of using mockito within hbase that I could use as starting points?

          No. Not yet. So strike my comment that you use mockito – some one of the committers should first provide examples of how to – but I think unit testing as much as you can of your new methods would be a good idea.

          Thanks for addressing my comments above. I'll review again when you add unit test. One unit test you could add is serializing and deserialzing a HMsg that has an empty regioninfo. See core/src/test/java/org/apache/hadoop/hbase/TestSerialization.java It has a testHMsg already. You could add to this test.

          Good stuff Eugene.

          Show
          stack added a comment - .bq Stack, ....I am hoping to write my unit tests in mockito after learning it - are there some examples of using mockito within hbase that I could use as starting points? No. Not yet. So strike my comment that you use mockito – some one of the committers should first provide examples of how to – but I think unit testing as much as you can of your new methods would be a good idea. Thanks for addressing my comments above. I'll review again when you add unit test. One unit test you could add is serializing and deserialzing a HMsg that has an empty regioninfo. See core/src/test/java/org/apache/hadoop/hbase/TestSerialization.java It has a testHMsg already. You could add to this test. Good stuff Eugene.
          Hide
          Eugene Koontz added a comment -

          I forgot to address this:

          "Does this work? + if (master.regionManager.regionIsInTransition(nsreRegion)) { Is regionsIntransition keyed by regionname as String (I haven't looked)."

          This works in my testing.

          "FYI, "// assumption: there is only one ROOT region, and it's called 'ROOT,,0'" is not an assumption, its a fact."

          Changed comment to reflect this.

          "Do you want to start the count at '0' instead? + int regionCount = 1;"

          Removed this variable regionCount, since it was only used for non-anomalous logging (for my development purposes).

          Show
          Eugene Koontz added a comment - I forgot to address this: "Does this work? + if (master.regionManager.regionIsInTransition(nsreRegion)) { Is regionsIntransition keyed by regionname as String (I haven't looked)." This works in my testing. "FYI, "// assumption: there is only one ROOT region, and it's called 'ROOT,,0'" is not an assumption, its a fact." Changed comment to reflect this. "Do you want to start the count at '0' instead? + int regionCount = 1;" Removed this variable regionCount, since it was only used for non-anomalous logging (for my development purposes).
          Hide
          Eugene Koontz added a comment -

          Hi Jonathan and Stack,

          Thanks again for your comments. I have created a new patch, against trunk, which addresses some of Stack's review comments, which I address also here in the comment to the attachment:

          "Your xml comment instead should move into a "description" element. See other configs. for the model:"

          Fixed now in hbase-default.xml in attached patch: xml comment text moved to <description> element.

          "What happens if the config. is other than lax or paranoid: e.g. the user misspells the config?"

          I decided to treat misspellings (neither "lax" nor "paranoid") as the same as the default, which is "lax", except that I put a LOG.warn() to alert the administrator of this fact.

          "As a general comment, you do not need to put HBASE-2486 on all your comments. Reading them, they are substantive enough w/o need of citing hbase-2486."

          "You don't need to put method name in log message; e.g. checkNSRERegion()."

          Removed most of these, except for two "(See HBASE-2486)" which I thought might be useful in keeping with other "(See HBASE-XXXX)" existing comments.

          "A log message that spans two log events is hard to grep for in logs. One event per log is usually best."

          "don't bother logging the 3.a consistent state. Its 'normal' so doesn't warrant logging."

          Removed multiple-line-spanning log events; replaced with single line log events or simply removed in case of non-anomalous events.

          "Rather than make a fake HRI, I'd say, make HMsg work w/ a null...."

          Have not yet addressed this : I plan to do so after writing some unit tests, which this bug still lacks.

          "I also now see why the numbering of items. You are trying to bring the comment from the issue over into the code. I'd say leave that out. Your comments stand by themselves w/o reference back to the issue."

          Removed issue-referencing numbering from comments; made comments stand on their own.

          Show
          Eugene Koontz added a comment - Hi Jonathan and Stack, Thanks again for your comments. I have created a new patch, against trunk, which addresses some of Stack's review comments, which I address also here in the comment to the attachment: "Your xml comment instead should move into a "description" element. See other configs. for the model:" Fixed now in hbase-default.xml in attached patch: xml comment text moved to <description> element. "What happens if the config. is other than lax or paranoid: e.g. the user misspells the config?" I decided to treat misspellings (neither "lax" nor "paranoid") as the same as the default, which is "lax", except that I put a LOG.warn() to alert the administrator of this fact. "As a general comment, you do not need to put HBASE-2486 on all your comments. Reading them, they are substantive enough w/o need of citing hbase-2486." "You don't need to put method name in log message; e.g. checkNSRERegion()." Removed most of these, except for two "(See HBASE-2486 )" which I thought might be useful in keeping with other "(See HBASE-XXXX)" existing comments. "A log message that spans two log events is hard to grep for in logs. One event per log is usually best." "don't bother logging the 3.a consistent state. Its 'normal' so doesn't warrant logging." Removed multiple-line-spanning log events; replaced with single line log events or simply removed in case of non-anomalous events. "Rather than make a fake HRI, I'd say, make HMsg work w/ a null...." Have not yet addressed this : I plan to do so after writing some unit tests, which this bug still lacks. "I also now see why the numbering of items. You are trying to bring the comment from the issue over into the code. I'd say leave that out. Your comments stand by themselves w/o reference back to the issue." Removed issue-referencing numbering from comments; made comments stand on their own.
          Hide
          Eugene Koontz added a comment -

          Jonathan, thank you for your comment; will create patch against trunk.

          Stack, thank you very much for all of your helpful review comments! I am hoping to write my unit tests in mockito after learning it - are there some examples of using mockito within hbase that I could use as starting points?

          Show
          Eugene Koontz added a comment - Jonathan, thank you for your comment; will create patch against trunk. Stack, thank you very much for all of your helpful review comments! I am hoping to write my unit tests in mockito after learning it - are there some examples of using mockito within hbase that I could use as starting points?
          Hide
          stack added a comment -

          @Eugene

          Your xml comment instead should move into a "description" element. See other configs. for the model:

          +    <!-- how to deal with situations such as
          +	 inconsistent beliefs between master and a regionserver concerning a region's location
          +	 (HBASE-2486) 
          +	 Two values are supported now: 'paranoid' (shut master down) 
          +         or 'lax' (mark region in question as unassigned).
          +      -->
          

          What happens if the config. is other than lax or paranoid: e.g. the user misspells the config?

          As a general comment, you do not need to put HBASE-2486 on all your comments. Reading them, they are substantive enough w/o need of citing hbase-2486.

          In checkNSRERegion, your comments start off w/ the number 3?

          Do you need to do this: + String nsreRegion = Bytes.toString(nsreMsg.getMessage());? Can't you put the offending region into the HMsg as its HRegionInfo payload? (The other HMsgs work this way)

          You don't need to put method name in log message; e.g. checkNSRERegion().

          Does this work? + if (master.regionManager.regionIsInTransition(nsreRegion)) { Is regionsIntransition keyed by regionname as String (I haven't looked).

          Regards logging, I'd say logging 'normal' status – e.g. that we're in 'consistent' state – is just going to make for noise in the logs. Log the anomaly's only I'd say.

          FYI, "// assumption: there is only one ROOT region, and it's called 'ROOT,,0'" is not an assumption, its a fact.

          Do you want to start the count at '0' instead? + int regionCount = 1;

          Don't do this:

          + LOG.debug("NSRE exception came from region server : " + nsreServerAddress.toString());
          + LOG.debug("according to regionManager, region server is : " + regionServerBelief);

          Put them together in the one message.

          Same for the Log.error later.

          A log message that spans two log events is hard to grep for in logs. One event per log is usually best.

          Yeah, don't bother logging the 3.a consistent state. Its 'normal' so doesn't warrant logging.

          Regards this method generally, if you did stuff like pass in the meta map of online regions as a param, and made the method static, then you might be able to write unit tests to cover a good portion of what is done in this method. For the fetch against the meta table, you might consider mocking this using something like the mockito framework. Unit tests would help here alot Eugene especially as this is you getting stuck in for the first time. Unit tests will help you understand better what is being passed around.

          I see now why you can't put an HRegionInfo as the second arg. to HMsg. Rather than make a fake HRI, I'd say, make HMsg work w/ a null....

          +    // HBASE 2486: 
          +    // "2) when a region sends a heartbeat, include a message for each of these regions, 
          +    //     MSG_REPORT_NSRE or somesuch, and then clear the set"
          +    byte[] nsre_region;
          +    // note that pollFirst() removes the first element from nsreSet as a side-effect of returning that element.
          +    // http://java.sun.com/javase/6/docs/api/java/util/NavigableSet.html#pollFirst()
          +    while ((nsre_region = nsreSet.pollFirst()) != null) {
          +      // create an empty 'fakeRegion', since HMsg's second argument
          +      // (an HRegionInfo) must not be null.
          +      HRegionInfo fake_region = new HRegionInfo();
          +      if (LOG.isDebugEnabled()) {
          +        LOG.debug("sending HMsg(MSG_REPORT_NSRE,fake_region,'" + Bytes.toString(nsre_region) + "') to master..");
          +      }
          +      getOutboundMsgs().add(new HMsg(HMsg.Type.MSG_REPORT_NSRE, fake_region, nsre_region));
          

          I also now see why the numbering of items. You are trying to bring the comment from the issue over into the code. I'd say leave that out. Your comments stand by themselves w/o reference back to the issue.

          Good stuff.

          Show
          stack added a comment - @Eugene Your xml comment instead should move into a "description" element. See other configs. for the model: + <!-- how to deal with situations such as + inconsistent beliefs between master and a regionserver concerning a region's location + (HBASE-2486) + Two values are supported now: 'paranoid' (shut master down) + or 'lax' (mark region in question as unassigned). + --> What happens if the config. is other than lax or paranoid: e.g. the user misspells the config? As a general comment, you do not need to put HBASE-2486 on all your comments. Reading them, they are substantive enough w/o need of citing hbase-2486. In checkNSRERegion, your comments start off w/ the number 3? Do you need to do this: + String nsreRegion = Bytes.toString(nsreMsg.getMessage());? Can't you put the offending region into the HMsg as its HRegionInfo payload? (The other HMsgs work this way) You don't need to put method name in log message; e.g. checkNSRERegion(). Does this work? + if (master.regionManager.regionIsInTransition(nsreRegion)) { Is regionsIntransition keyed by regionname as String (I haven't looked). Regards logging, I'd say logging 'normal' status – e.g. that we're in 'consistent' state – is just going to make for noise in the logs. Log the anomaly's only I'd say. FYI, "// assumption: there is only one ROOT region, and it's called ' ROOT ,,0'" is not an assumption, its a fact. Do you want to start the count at '0' instead? + int regionCount = 1; Don't do this: + LOG.debug("NSRE exception came from region server : " + nsreServerAddress.toString()); + LOG.debug("according to regionManager, region server is : " + regionServerBelief); Put them together in the one message. Same for the Log.error later. A log message that spans two log events is hard to grep for in logs. One event per log is usually best. Yeah, don't bother logging the 3.a consistent state. Its 'normal' so doesn't warrant logging. Regards this method generally, if you did stuff like pass in the meta map of online regions as a param, and made the method static, then you might be able to write unit tests to cover a good portion of what is done in this method. For the fetch against the meta table, you might consider mocking this using something like the mockito framework. Unit tests would help here alot Eugene especially as this is you getting stuck in for the first time. Unit tests will help you understand better what is being passed around. I see now why you can't put an HRegionInfo as the second arg. to HMsg. Rather than make a fake HRI, I'd say, make HMsg work w/ a null.... + // HBASE 2486: + // "2) when a region sends a heartbeat, include a message for each of these regions, + // MSG_REPORT_NSRE or somesuch, and then clear the set" + byte [] nsre_region; + // note that pollFirst() removes the first element from nsreSet as a side-effect of returning that element. + // http://java.sun.com/javase/6/docs/api/java/util/NavigableSet.html#pollFirst() + while ((nsre_region = nsreSet.pollFirst()) != null ) { + // create an empty 'fakeRegion', since HMsg's second argument + // (an HRegionInfo) must not be null . + HRegionInfo fake_region = new HRegionInfo(); + if (LOG.isDebugEnabled()) { + LOG.debug( "sending HMsg(MSG_REPORT_NSRE,fake_region,'" + Bytes.toString(nsre_region) + "') to master.." ); + } + getOutboundMsgs().add( new HMsg(HMsg.Type.MSG_REPORT_NSRE, fake_region, nsre_region)); I also now see why the numbering of items. You are trying to bring the comment from the issue over into the code. I'd say leave that out. Your comments stand by themselves w/o reference back to the issue. Good stuff.
          Hide
          Jonathan Gray added a comment -

          @Eugene, this will likely not be put into the 0.20 line. I would recommend rebasing this against trunk.

          Show
          Jonathan Gray added a comment - @Eugene, this will likely not be put into the 0.20 line. I would recommend rebasing this against trunk.
          Hide
          Eugene Koontz added a comment -

          Please note that I am working on unit tests for this bug, but they are not included in the above patch (14/May/10 06:43 PM).

          Show
          Eugene Koontz added a comment - Please note that I am working on unit tests for this bug, but they are not included in the above patch (14/May/10 06:43 PM).
          Hide
          Eugene Koontz added a comment -

          patch for HBASE-2486

          Show
          Eugene Koontz added a comment - patch for HBASE-2486
          Hide
          Eugene Koontz added a comment -

          Note that this patch is against 0.20.3; I can create another patch against 0.20.4 and/or other tags if desired by reviewer.

          Show
          Eugene Koontz added a comment - Note that this patch is against 0.20.3; I can create another patch against 0.20.4 and/or other tags if desired by reviewer.
          Hide
          stack added a comment -

          Bulk move of 0.20.5 issues into 0.21.0 after vote that we merge branch into TRUNK up on list.

          Show
          stack added a comment - Bulk move of 0.20.5 issues into 0.21.0 after vote that we merge branch into TRUNK up on list.
          Hide
          Kevin Peterson added a comment -

          Wanted to document killing and restarting the master as a workaround. This solves at least some circumstances where the master and .META. disagree with the regionserver about what regions it is hosting. I did the following and it worked for me:

          1. Tail log on master to ensure that the master is not doing anything.
          2. Kill -9 the master to exit without setting shutdown node in ZK.
          3. Restart the master.

          This ended the repeated NotServingRegionException I had been seeing without needing to bring down the cluster. YMMV.

          Show
          Kevin Peterson added a comment - Wanted to document killing and restarting the master as a workaround. This solves at least some circumstances where the master and .META. disagree with the regionserver about what regions it is hosting. I did the following and it worked for me: 1. Tail log on master to ensure that the master is not doing anything. 2. Kill -9 the master to exit without setting shutdown node in ZK. 3. Restart the master. This ended the repeated NotServingRegionException I had been seeing without needing to bring down the cluster. YMMV.
          Hide
          Eugene Koontz added a comment -

          2526 seems related since they both have to do with ensuring/testing/verifying of the .META. table (which maps table regions to specific region servers).

          Show
          Eugene Koontz added a comment - 2526 seems related since they both have to do with ensuring/testing/verifying of the .META. table (which maps table regions to specific region servers).
          Hide
          stack added a comment -

          Assigning Eugene at his request.

          Show
          stack added a comment - Assigning Eugene at his request.
          Hide
          Eugene Koontz added a comment -

          2408 and 2486 seem related because 2408 mentions Not Serving Region Exceptions as an example of where it would be better to use client-server communication rather than exception throwing:

          "HBASE-72 "'Normal' operation should not depend on throwing of exceptions (e.g. NotServingRegionException)". Rather than have the server throw a NotServingRegionException as we do now as signal to client to go look elsewhere for the wanted data, we could instead signal the client to look elsewhere by setting a state in the envelope."

          Show
          Eugene Koontz added a comment - 2408 and 2486 seem related because 2408 mentions Not Serving Region Exceptions as an example of where it would be better to use client-server communication rather than exception throwing: " HBASE-72 "'Normal' operation should not depend on throwing of exceptions (e.g. NotServingRegionException)". Rather than have the server throw a NotServingRegionException as we do now as signal to client to go look elsewhere for the wanted data, we could instead signal the client to look elsewhere by setting a state in the envelope."
          Hide
          Todd Lipcon added a comment -

          Is HBASE-1920 still valid? I've been testing lots of region server death/restarts/etc with write workloads and not seen that. But maybe with a read workload it doesn't respond the same way.

          Show
          Todd Lipcon added a comment - Is HBASE-1920 still valid? I've been testing lots of region server death/restarts/etc with write workloads and not seen that. But maybe with a read workload it doesn't respond the same way.
          Hide
          stack added a comment -

          This don't look too hard too do. I was thinking that we need to make sure that 3a and 3b don't have holes but if a 3c, then then we'll find them the quicker.

          Doing Todd' technique above would invalidate need for HBASE-1920?

          Show
          stack added a comment - This don't look too hard too do. I was thinking that we need to make sure that 3a and 3b don't have holes but if a 3c, then then we'll find them the quicker. Doing Todd' technique above would invalidate need for HBASE-1920 ?
          Hide
          Andrew Purtell added a comment -

          Bringing into 0.20.5. I presume the master rewrite will incorporate something similar but implemented with a different strategy.

          Show
          Andrew Purtell added a comment - Bringing into 0.20.5. I presume the master rewrite will incorporate something similar but implemented with a different strategy.

            People

            • Assignee:
              Eugene Koontz
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:

                Development