HBase
  1. HBase
  2. HBASE-72

[hbase] 'Normal' operation should not depend on throwing of exceptions (e.g. NotServingRegionException)

    Details

    • Type: Improvement Improvement
    • Status: In Progress
    • Priority: Trivial Trivial
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Region server and client logs will have lots of the following when a cluster is being loaded:

      org.apache.hadoop.hbase.NotServingRegionException: hbaserepository,,7144829661993961256
              at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1208)
              at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1180)
              at org.apache.hadoop.hbase.HRegionServer.startUpdate(HRegionServer.java:1122)
              at org.apache.hadoop.hbase.HRegionServer.batchUpdate(HRegionServer.java:985)
              at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
              at java.lang.reflect.Method.invoke(Method.java:597)
              at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:340)
              at org.apache.hadoop.ipc.Server$Handler.run(Server.java:566)
      

      The NotServingRegionException exception is thrown when the remote server is no longer serving the asked-for region (usually because its been split). The server throws the exception to provoke the client into making a new interrogation of region locations.

      It would be an improvement if such 'normal' operation was not built atop exceptions. For example, commits might return a 'region moved' message.

        Issue Links

          Activity

          Hide
          Ted Yu added a comment -

          It is desirable to associate NotServingRegionException (the frequency, e.g.) with other recent errors in region server to alert the user of the real problem.

          Show
          Ted Yu added a comment - It is desirable to associate NotServingRegionException (the frequency, e.g.) with other recent errors in region server to alert the user of the real problem.
          Hide
          stack added a comment -

          From the list, a new user is confused by this 'Exception'; thinks it an error:

          Message-ID: <25e5a0c00804151554g5078a436pac74512979654294@mail.gmail.com>
          Date: Tue, 15 Apr 2008 15:54:29 -0700
          From: "Daniel Leffel" <daniel.leffel@gmail.com>
          To: hbase-user@hadoop.apache.org
          Subject: Re: Region Server Processes Exit Unexpectedly during Moderate Load MapReduce
          In-Reply-To: <25e5a0c00804151543o6e7ef438y6dfcb8babc6c582f@mail.gmail.com>
          References: <25e5a0c00804150907p5f1c9796pf7ab288322ad70ab@mail.gmail.com>
          	 <4804D3F6.4090002@duboce.net>
          	 <25e5a0c00804151543o6e7ef438y6dfcb8babc6c582f@mail.gmail.com>
          
          Now I just got this exception:
          
          2008-04-15 18:50:02,107 DEBUG org.apache.hadoop.hbase.HStore: maximum
          sequence id for hstore 856584617/rule_id is 68555204
          2008-04-15 18:50:02,284 INFO org.apache.hadoop.ipc.Server: IPC Server
          handler 8 on 60020, call
          batchUpdate(category_rule_pricebin_statistics,2332627_1_-11,1208293443363,
          9223372036854775807, org.apache.hadoop.hbase.io.BatchUpdate@19d277e)
          from 10.252.50.36:53955: error:
          org.apache.hadoop.hbase.NotServingRegionException:
          category_rule_pricebin_statistics,2332627_1_-11,1208293443363
          org.apache.hadoop.hbase.NotServingRegionException:
          category_rule_pricebin_statistics,2332627_1_-11,1208293443363
          ...
          
          Show
          stack added a comment - From the list, a new user is confused by this 'Exception'; thinks it an error: Message-ID: <25e5a0c00804151554g5078a436pac74512979654294@mail.gmail.com> Date: Tue, 15 Apr 2008 15:54:29 -0700 From: "Daniel Leffel" <daniel.leffel@gmail.com> To: hbase-user@hadoop.apache.org Subject: Re: Region Server Processes Exit Unexpectedly during Moderate Load MapReduce In-Reply-To: <25e5a0c00804151543o6e7ef438y6dfcb8babc6c582f@mail.gmail.com> References: <25e5a0c00804150907p5f1c9796pf7ab288322ad70ab@mail.gmail.com> <4804D3F6.4090002@duboce.net> <25e5a0c00804151543o6e7ef438y6dfcb8babc6c582f@mail.gmail.com> Now I just got this exception: 2008-04-15 18:50:02,107 DEBUG org.apache.hadoop.hbase.HStore: maximum sequence id for hstore 856584617/rule_id is 68555204 2008-04-15 18:50:02,284 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on 60020, call batchUpdate(category_rule_pricebin_statistics,2332627_1_-11,1208293443363, 9223372036854775807, org.apache.hadoop.hbase.io.BatchUpdate@19d277e) from 10.252.50.36:53955: error: org.apache.hadoop.hbase.NotServingRegionException: category_rule_pricebin_statistics,2332627_1_-11,1208293443363 org.apache.hadoop.hbase.NotServingRegionException: category_rule_pricebin_statistics,2332627_1_-11,1208293443363 ...
          Hide
          Bryan Duxbury added a comment -

          Can we close this issue? if there was a significant amount of desire to refactor this interaction, it'd have been done by now.

          Show
          Bryan Duxbury added a comment - Can we close this issue? if there was a significant amount of desire to refactor this interaction, it'd have been done by now.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org
          against trunk revision r582413.

          @author +1. The patch does not contain any @author tags.

          patch -1. The patch command could not apply the patch.

          Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/894/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org against trunk revision r582413. @author +1. The patch does not contain any @author tags. patch -1. The patch command could not apply the patch. Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/894/console This message is automatically generated.
          Hide
          Jim Kellerman added a comment -

          Re-opening issue.

          While it would be an improvement, there are a couple of reasons not to do it:

          • It changes the wire protocol and because we cannot overload the method (since the only change is in the return), we can't deprecate the current interface and replace it with a method of the same name. Instead we would have to create a new API with a different name.
          • What we currently have works and perhaps the problem described above is a good reason not to do it.
          Show
          Jim Kellerman added a comment - Re-opening issue. While it would be an improvement, there are a couple of reasons not to do it: It changes the wire protocol and because we cannot overload the method (since the only change is in the return), we can't deprecate the current interface and replace it with a method of the same name. Instead we would have to create a new API with a different name. What we currently have works and perhaps the problem described above is a good reason not to do it.
          Hide
          Jim Kellerman added a comment -

          Since there are other implementations that subclass HRegionServer and HTable, We should not change the API without a better reason for doing so.

          Show
          Jim Kellerman added a comment - Since there are other implementations that subclass HRegionServer and HTable, We should not change the API without a better reason for doing so.
          Hide
          stack added a comment -

          I didn't realize the exception only shows at DEBUG level. That undoes the "exceptions in the logs are unsettling" but though its not showing, we are still using the exception mechanism to deal with normal operation.

          Regards 'region moved' message, yes, region servers have no remembrance of regions past but I was thinking the commit return message nothing more than the boolean – succeeded/failed – that you suggested or a 'region ain't here'.

          Show
          stack added a comment - I didn't realize the exception only shows at DEBUG level. That undoes the "exceptions in the logs are unsettling" but though its not showing, we are still using the exception mechanism to deal with normal operation. Regards 'region moved' message, yes, region servers have no remembrance of regions past but I was thinking the commit return message nothing more than the boolean – succeeded/failed – that you suggested or a 'region ain't here'.
          Hide
          Jim Kellerman added a comment -

          The only time a message is written to the logs is if DEBUG level logging is enabled. Under normal operation, the log level should be set to INFO in which case no stack trace will be visible unless the client runs out of retries (in which case both NotServingRegionException and WrongRegionException should be treated as errors).

          commit can not return a "region moved" message because the region server has no idea what happened some time earlier. It just knows that the region that was requested of it is not one of the ones it is serving.

          Debug logs are not pretty and if someone does enable them, then they are not operating "normally".

          For these reasons I'm inclined to not make this change.

          Show
          Jim Kellerman added a comment - The only time a message is written to the logs is if DEBUG level logging is enabled. Under normal operation, the log level should be set to INFO in which case no stack trace will be visible unless the client runs out of retries (in which case both NotServingRegionException and WrongRegionException should be treated as errors). commit can not return a "region moved" message because the region server has no idea what happened some time earlier. It just knows that the region that was requested of it is not one of the ones it is serving. Debug logs are not pretty and if someone does enable them, then they are not operating "normally". For these reasons I'm inclined to not make this change.
          Hide
          Jim Kellerman added a comment -

          There are two cases for batchUpdate that I would consider to be "normal operation" as you have described it. Both could result from a region split: NotServingRegionException, and WrongRegionException (in which the row being updated falls outside the row range of the region that is being served).

          The other exceptions that might get thrown during a batch update really are problems: region is offline, region is closed, etc. These will remain exceptions.

          batchUpdate will be modified to return a boolean: true if successful; false if the server caught a NotServingRegionException or WrongRegionException. When the client receives a false answer, it needs to "recalibrate"

          Show
          Jim Kellerman added a comment - There are two cases for batchUpdate that I would consider to be "normal operation" as you have described it. Both could result from a region split: NotServingRegionException, and WrongRegionException (in which the row being updated falls outside the row range of the region that is being served). The other exceptions that might get thrown during a batch update really are problems: region is offline, region is closed, etc. These will remain exceptions. batchUpdate will be modified to return a boolean: true if successful; false if the server caught a NotServingRegionException or WrongRegionException. When the client receives a false answer, it needs to "recalibrate"
          Hide
          Jim Kellerman added a comment -

          Ok, I'll buy that argument. We'll change the API for commit only.

          Show
          Jim Kellerman added a comment - Ok, I'll buy that argument. We'll change the API for commit only.
          Hide
          stack added a comment -

          Here's a couple of comments.

          Those that return a value are all reads I believe. Its unlikely a read will encounter a NotServingRegionException. If they do, the circumstances are probably 'exceptional' such as a crashed region server-- so an exception seems appropriate here.

          On the other hand, a stream of writes ineluctably provokes splits and region relocations. IMO, such splits are not-unexpected and do not constitute an 'error', nor an 'exceptional' or 'abnormal' condition. Use of NSRE realigning the client strikes me as an instance of using exceptions to ordain program control.

          Seeing exceptions in the logs though all is working properly is unsettling.

          batchUpdate commit would seem to be the main provocateur of NotServingRegionExceptions during 'normal' operation. Currently it returns void. Would it be odd if just this one method returned a status code that had to be checked? (Its rough counterpart in JDBC has a primitive status facility: http://java.sun.com/j2se/1.5.0/docs/api/java/sql/Statement.html#executeBatch()).

          Show
          stack added a comment - Here's a couple of comments. Those that return a value are all reads I believe. Its unlikely a read will encounter a NotServingRegionException. If they do, the circumstances are probably 'exceptional' such as a crashed region server-- so an exception seems appropriate here. On the other hand, a stream of writes ineluctably provokes splits and region relocations. IMO, such splits are not-unexpected and do not constitute an 'error', nor an 'exceptional' or 'abnormal' condition. Use of NSRE realigning the client strikes me as an instance of using exceptions to ordain program control. Seeing exceptions in the logs though all is working properly is unsettling. batchUpdate commit would seem to be the main provocateur of NotServingRegionExceptions during 'normal' operation. Currently it returns void. Would it be odd if just this one method returned a status code that had to be checked? (Its rough counterpart in JDBC has a primitive status facility: http://java.sun.com/j2se/1.5.0/docs/api/java/sql/Statement.html#executeBatch( )).
          Hide
          Jim Kellerman added a comment -

          Looking at the (non-deprecated) methods in HRegionInterface, there are two that could return a status: batchUpdate and close (a scanner).

          However, the remainder return a value.

          The Java paradigm seems (to me at least) to be that methods return values or void and not a status code. Errors are handled by throwing exceptions. This is the paradigm that is followed by the other Hadoop RPC protocols.

          A client sending a request to the wrong server, either due to a bug or because the region has moved to another server feels like an error to me, and throwing an exception to get the client to 'recalibrate' seems ok.

          The server really has no other choice because input parameters are final (any modification to them are not returned to the client).

          The only other thing we could do is wrap each returned value in another Writable which always contains a "status message". This does not feel like the right paradigm to me.

          The only other solution I can think of is to make two RPCs for every one we make today. The first asks the server, "Are you serving this region?", and based on that answer either sends the "real" message or "recalibrates". Seems highly inefficient to me.

          If there are other ideas I have overlooked, please comment here.

          Show
          Jim Kellerman added a comment - Looking at the (non-deprecated) methods in HRegionInterface, there are two that could return a status: batchUpdate and close (a scanner). However, the remainder return a value. The Java paradigm seems (to me at least) to be that methods return values or void and not a status code. Errors are handled by throwing exceptions. This is the paradigm that is followed by the other Hadoop RPC protocols. A client sending a request to the wrong server, either due to a bug or because the region has moved to another server feels like an error to me, and throwing an exception to get the client to 'recalibrate' seems ok. The server really has no other choice because input parameters are final (any modification to them are not returned to the client). The only other thing we could do is wrap each returned value in another Writable which always contains a "status message". This does not feel like the right paradigm to me. The only other solution I can think of is to make two RPCs for every one we make today. The first asks the server, "Are you serving this region?", and based on that answer either sends the "real" message or "recalibrates". Seems highly inefficient to me. If there are other ideas I have overlooked, please comment here.

            People

            • Assignee:
              Unassigned
              Reporter:
              stack
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:

                Development