Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Critical Critical
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Client

      Description

      Right now almost all retry behavior is governed by a single parameter that determines the number of retries. In a few places, there are also conf for the number of millis to sleep between retries. This isn't quite flexible enough. If we can refactor some of the retry logic into a RetryPolicy class, we could introduce exponential backoff where appropriate, clean up some of the config, etc

        Issue Links

          Activity

          Hide
          Todd Lipcon added a comment -

          One particular example:

          when processing a batch put, HConnectionManager uses the same retry count for the outer loop (number of batches to attempt) and the inner loop (number of times to retry an individual region server). For each region server, it treats socket layer exceptions and application layer exceptions the same with regard to retries.

          This is not ideal: if I kill a region server while running an import, I find one of these two things happens:

          • if I leave the number of retries configured at the default, the "outer loop" runs out of retries before all of the regions have been reassigned, thus the multiput fails
          • if I configure the number of retries to 80 (meaning 80 seconds at the default sleep time of 1sec for most operations) then I actually end up retrying the same RS for 80 seconds without even refreshing the locations (the inner loop doesn't refresh).

          I would like the fine grained configuration to be able to say:

          • never retry a "connection refused" or "no route to host" error in the inner loop - I'd rather go back to meta to see if it's been reassigned
          • in this particular case, same if I get NotServingRegionException - no sense retrying!
          • for other errors, it may be worth one or two retries [not sure what errors those might be!]

          For the outer loop I'd really like enough retries to wait around for at least 90 seconds, to give the master time to notice the dead RS and reassign the regions.

          This is just one example, but there are other places where being able to specify a more complete retry policy would help.

          Show
          Todd Lipcon added a comment - One particular example: when processing a batch put, HConnectionManager uses the same retry count for the outer loop (number of batches to attempt) and the inner loop (number of times to retry an individual region server). For each region server, it treats socket layer exceptions and application layer exceptions the same with regard to retries. This is not ideal: if I kill a region server while running an import, I find one of these two things happens: if I leave the number of retries configured at the default, the "outer loop" runs out of retries before all of the regions have been reassigned, thus the multiput fails if I configure the number of retries to 80 (meaning 80 seconds at the default sleep time of 1sec for most operations) then I actually end up retrying the same RS for 80 seconds without even refreshing the locations (the inner loop doesn't refresh). I would like the fine grained configuration to be able to say: never retry a "connection refused" or "no route to host" error in the inner loop - I'd rather go back to meta to see if it's been reassigned in this particular case, same if I get NotServingRegionException - no sense retrying! for other errors, it may be worth one or two retries [not sure what errors those might be!] For the outer loop I'd really like enough retries to wait around for at least 90 seconds, to give the master time to notice the dead RS and reassign the regions. This is just one example, but there are other places where being able to specify a more complete retry policy would help.
          Hide
          stack added a comment -

          Marking critical and for 0.20.5 and 0.21. We can punt later but this seems kinda important to figure.

          Show
          stack added a comment - Marking critical and for 0.20.5 and 0.21. We can punt later but this seems kinda important to figure.
          Hide
          Todd Lipcon added a comment -

          Unassigning from myself since I'm not working on this currently. HBASE-2421 solved the major issue I was running into, but I think this JIRA is still important as a general topic - we should expose timeouts, not retry counts, as the user visible idea.

          Show
          Todd Lipcon added a comment - Unassigning from myself since I'm not working on this currently. HBASE-2421 solved the major issue I was running into, but I think this JIRA is still important as a general topic - we should expose timeouts, not retry counts, as the user visible idea.
          Hide
          Miklos Kurucz added a comment -

          I am having problems with setting hbase.client.retries.number = 1
          In that case cache will never be updated.

          Jean-Daniel Cryans asked me:
          "Yeah I understand that retries are unusable at that level, but you still want retries in order to be able to recalibrate the .META. cache right?"
          My answer is that I want HCM to update cache when it is necessary, but I don't think that should only happen in retries.
          For me it seems that the two things can be separated:
          Whenever a NotServingRegionException is caught the cache entry should be cleared.
          When an exception is caugh that is not related to bad cache entries the cache should not be cleared.

          The current exception handling is done this way if I'm correct:

          Server sends NSRE
          client/ScannerCallable.java:84
          84 if (ioe instanceof NotServingRegionException)

          { 85 // Throw a DNRE so that we break out of cycle of calling NSRE 86 // when what we need is to open scanner against new location. 87 // Attach NSRE to signal client that it needs to resetup scanner. 88 throw new DoNotRetryIOException("Reset scanner", ioe); client/HConnectionManager.java:1063 1063 }

          catch (Throwable t) {
          1064 t = translateException(t);
          client/HConnectionManager.java:1431
          1431 if (t instanceof DoNotRetryIOException)

          { 1432 throw (DoNotRetryIOException)t; client/HTable.java:824 824 }

          catch (DoNotRetryIOException e) {
          ...
          835 Throwable cause = e.getCause();
          836 if (cause == null || !(cause instanceof NotServingRegionException))

          { 837 throw e; 838 }

          839 // Else, its signal from depths of ScannerCallable that we got an
          840 // NSRE on a next and that we need to reset the scanner.

          And after resetting the scanner we get to:
          client/HTable.java:776
          776 callable = getScannerCallable(localStartKey, nbRows);
          777 // Open a scanner on the region server starting at the
          778 // beginning of the region
          779 getConnection().getRegionServerWithRetries(callable);

          Which will still use the bad cache entry first? And runs on the same problem?

          Perhaps I am mistaken and somewhere we do delete bad cache entry, but in case
          Server sends java.net.ConnectException
          client/HConnectionManager.java:1063
          1063 } catch (Throwable t) {
          1064 t = translateException(t);
          1065 exceptions.add(t);
          1066 if (tries == numRetries - 1) {
          1067 throw new RetriesExhaustedException(callable.getServerName(),
          1068 callable.getRegionName(), callable.getRow(), tries, exceptions);

          We will not delete bad entry.
          I understand that ConnectException can be thrown for various reasons, still deleting the cache entry might be a good idea even when hbase.client.retries.number = 1
          Also retrying a "connection refused" does not make sense to me either.

          Show
          Miklos Kurucz added a comment - I am having problems with setting hbase.client.retries.number = 1 In that case cache will never be updated. Jean-Daniel Cryans asked me: "Yeah I understand that retries are unusable at that level, but you still want retries in order to be able to recalibrate the .META. cache right?" My answer is that I want HCM to update cache when it is necessary, but I don't think that should only happen in retries. For me it seems that the two things can be separated: Whenever a NotServingRegionException is caught the cache entry should be cleared. When an exception is caugh that is not related to bad cache entries the cache should not be cleared. The current exception handling is done this way if I'm correct: Server sends NSRE client/ScannerCallable.java:84 84 if (ioe instanceof NotServingRegionException) { 85 // Throw a DNRE so that we break out of cycle of calling NSRE 86 // when what we need is to open scanner against new location. 87 // Attach NSRE to signal client that it needs to resetup scanner. 88 throw new DoNotRetryIOException("Reset scanner", ioe); client/HConnectionManager.java:1063 1063 } catch (Throwable t) { 1064 t = translateException(t); client/HConnectionManager.java:1431 1431 if (t instanceof DoNotRetryIOException) { 1432 throw (DoNotRetryIOException)t; client/HTable.java:824 824 } catch (DoNotRetryIOException e) { ... 835 Throwable cause = e.getCause(); 836 if (cause == null || !(cause instanceof NotServingRegionException)) { 837 throw e; 838 } 839 // Else, its signal from depths of ScannerCallable that we got an 840 // NSRE on a next and that we need to reset the scanner. And after resetting the scanner we get to: client/HTable.java:776 776 callable = getScannerCallable(localStartKey, nbRows); 777 // Open a scanner on the region server starting at the 778 // beginning of the region 779 getConnection().getRegionServerWithRetries(callable); Which will still use the bad cache entry first? And runs on the same problem? Perhaps I am mistaken and somewhere we do delete bad cache entry, but in case Server sends java.net.ConnectException client/HConnectionManager.java:1063 1063 } catch (Throwable t) { 1064 t = translateException(t); 1065 exceptions.add(t); 1066 if (tries == numRetries - 1) { 1067 throw new RetriesExhaustedException(callable.getServerName(), 1068 callable.getRegionName(), callable.getRow(), tries, exceptions); We will not delete bad entry. I understand that ConnectException can be thrown for various reasons, still deleting the cache entry might be a good idea even when hbase.client.retries.number = 1 Also retrying a "connection refused" does not make sense to me either.
          Hide
          stack added a comment -

          Bulk move of 0.20.5 issues into 0.21.0 after vote that we merge branch into TRUNK up on list.

          Show
          stack added a comment - Bulk move of 0.20.5 issues into 0.21.0 after vote that we merge branch into TRUNK up on list.
          Hide
          Dave Latham added a comment -

          I have a use case where I'd like to perform a Get with a timeout of 500 ms or so. If HBase is unable to retrieve the result within that time, an exception or null result is fine, though most of the time, I expect it to work fine. The same process has other uses where the normal retry policy makes sense, but in this case, it has to be fast, or nothing. Right now I'm forced to create a separate thread pool to handle these requests, and the threads can all get tied up trying to hit one bad regionserver, for example.

          Not sure if this case will be solved by this JIRA, but it seems related.

          Show
          Dave Latham added a comment - I have a use case where I'd like to perform a Get with a timeout of 500 ms or so. If HBase is unable to retrieve the result within that time, an exception or null result is fine, though most of the time, I expect it to work fine. The same process has other uses where the normal retry policy makes sense, but in this case, it has to be fast, or nothing. Right now I'm forced to create a separate thread pool to handle these requests, and the threads can all get tied up trying to hit one bad regionserver, for example. Not sure if this case will be solved by this JIRA, but it seems related.
          Hide
          Todd Lipcon added a comment -

          Yep, that's the aim of this JIRA. Real world latency SLAs are expressed in milliseconds, not in retry counts, and we should let people push those SLAs in their operations.

          Show
          Todd Lipcon added a comment - Yep, that's the aim of this JIRA. Real world latency SLAs are expressed in milliseconds, not in retry counts, and we should let people push those SLAs in their operations.
          Hide
          Jean-Daniel Cryans added a comment -

          This issue is very important, but I don't see this being worked on before October. Can we punt?

          Show
          Jean-Daniel Cryans added a comment - This issue is very important, but I don't see this being worked on before October. Can we punt?
          Hide
          Jonathan Gray added a comment -

          The new master has a completely different way of doing retry stuff (just a timeout, not retries + pause). Not sure if we are going to completely rework the client in time for 0.90 but the new stuff like CatalogTracker make this much much easier.

          Show
          Jonathan Gray added a comment - The new master has a completely different way of doing retry stuff (just a timeout, not retries + pause). Not sure if we are going to completely rework the client in time for 0.90 but the new stuff like CatalogTracker make this much much easier.
          Hide
          stack added a comment -

          Moving out of 0.90 – new 'feature'.

          Show
          stack added a comment - Moving out of 0.90 – new 'feature'.
          Hide
          Dave Latham added a comment -

          Here's hoping it can get picked up for 0.92

          Show
          Dave Latham added a comment - Here's hoping it can get picked up for 0.92
          Hide
          Lars Hofhansl added a comment -

          I don't see anybody working on this for 0.94. Pull back if you feel otherwise.

          Show
          Lars Hofhansl added a comment - I don't see anybody working on this for 0.94. Pull back if you feel otherwise.
          Hide
          stack added a comment -

          We need this bad but not being worked on currently that I know of... moving this out of 0.96.0

          Show
          stack added a comment - We need this bad but not being worked on currently that I know of... moving this out of 0.96.0

            People

            • Assignee:
              Unassigned
              Reporter:
              Todd Lipcon
            • Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:

                Development