HBase
  1. HBase
  2. HBASE-5757

TableInputFormat should handle as many errors as possible

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.90.6
    • Fix Version/s: 0.94.1, 0.95.0
    • Component/s: mapreduce
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after any IOException? I see the following disadvantages of current approach

      • the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout
      • to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period
      • timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters
      • I don't see any possibility to get rid of LeaseException (this is configured on server side)

      I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?

      1. 5757-trunk-v2.txt
        9 kB
        Ted Yu
      2. HBASE-5757.patch
        9 kB
        Jan Lukavsky
      3. HBASE-5757.patch
        2 kB
        Jan Lukavsky
      4. hbase-5757-92.patch
        7 kB
        Jonathan Hsieh
      5. HBASE-5757-trunk-r1341041.patch
        9 kB
        Jan Lukavsky

        Issue Links

          Activity

          Lars Hofhansl made changes -
          Fix Version/s 0.94.1 [ 12320257 ]
          stack made changes -
          Fix Version/s 0.95.0 [ 12324094 ]
          Fix Version/s 0.90.7 [ 12319481 ]
          Fix Version/s 0.92.2 [ 12319888 ]
          Fix Version/s 0.96.0 [ 12320040 ]
          Fix Version/s 0.94.1 [ 12320257 ]
          Lars Hofhansl made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          stack made changes -
          Component/s mapred [ 12312137 ]
          Jonathan Hsieh made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags Reviewed [ 10343 ]
          Fix Version/s 0.90.7 [ 12319481 ]
          Fix Version/s 0.92.2 [ 12319888 ]
          Resolution Fixed [ 1 ]
          Jonathan Hsieh made changes -
          Attachment hbase-5757-92.patch [ 12528472 ]
          Jonathan Hsieh made changes -
          Fix Version/s 0.96.0 [ 12320040 ]
          Fix Version/s 0.94.1 [ 12320257 ]
          Jonathan Hsieh made changes -
          Assignee Jan Lukavsky [ je.ik ]
          Ted Yu made changes -
          Attachment 5757-trunk-v2.txt [ 12528448 ]
          Ted Yu made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Jan Lukavsky made changes -
          Attachment HBASE-5757-trunk-r1341041.patch [ 12528434 ]
          Jan Lukavsky made changes -
          Attachment HBASE-5757.patch [ 12527262 ]
          Jonathan Hsieh made changes -
          Link This issue is related to HBASE-5973 [ HBASE-5973 ]
          Jonathan Hsieh made changes -
          Link This issue is related to HBASE-2161 [ HBASE-2161 ]
          Jan Lukavsky made changes -
          Summary TableInputFormat should handle as much errors as possible TableInputFormat should handle as many errors as possible
          Jan Lukavsky made changes -
          Attachment HBASE-5757.patch [ 12522222 ]
          Jan Lukavsky made changes -
          Field Original Value New Value
          Description Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach
           * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout
           * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period
           * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters
           * I don't see any possibility to get rid of LeaseException (this is configured on server side)

          I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?
          Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach
           * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout
           * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period
           * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters
           * I don't see any possibility to get rid of LeaseException (this is configured on server side)

          I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?-
          Jan Lukavsky created issue -

            People

            • Assignee:
              Jan Lukavsky
              Reporter:
              Jan Lukavsky
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development