Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-506

When an exception has to escape ServerCallable due to exhausted retries, show all the exceptions that lead to this situation

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.1.0, 0.2.0
    • Component/s: Client
    • Labels:
      None

      Description

      Every so often, we find ourselves trying to debug a problem that happens in HTable where we exhaust all our retries trying to contact the region server hosting the region we want to operate on. Oftentimes the last exception that comes out is something like WrongRegionException, which should just never be the case.

      As a way to improve our debugging capabilities, when we decide to throw an exception out of ServerCallable, let's show not just the last exception but all the exceptions that caused all the retries in the first place. This will help us understand the sequence of events that led to us running out of retries.

      1. 506.patch
        2 kB
        Bryan Duxbury
      2. 506-0.1.patch
        2 kB
        Bryan Duxbury

        Activity

        Hide
        bryanduxbury Bryan Duxbury added a comment -

        Here's a patch to add this functionality for 0.1. (Nearly the same patch would apply to trunk, but let's see if this works first.)

        Show
        bryanduxbury Bryan Duxbury added a comment - Here's a patch to add this functionality for 0.1. (Nearly the same patch would apply to trunk, but let's see if this works first.)
        Hide
        stack stack added a comment -

        +1 on patch. Falls into the debugging tools category – will help prove/disprove IRC theory that WREs just happen to be the last of a set of retryes – so fine to apply to 0.1 branch.

        Show
        stack stack added a comment - +1 on patch. Falls into the debugging tools category – will help prove/disprove IRC theory that WREs just happen to be the last of a set of retryes – so fine to apply to 0.1 branch.
        Hide
        stack stack added a comment -

        I saw below running tests on patch:

        [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 72.628 sec
        [junit] Test org.apache.hadoop.hbase.TestEmptyMetaInfo FAILED

        Do you see same?

        Show
        stack stack added a comment - I saw below running tests on patch: [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 72.628 sec [junit] Test org.apache.hadoop.hbase.TestEmptyMetaInfo FAILED Do you see same?
        Hide
        bryanduxbury Bryan Duxbury added a comment -

        Don't see that failure in my test suite because the TestEmptyMetaInfo test doesn't exist yet. HBASE-27 hasn't been applied, right?

        Show
        bryanduxbury Bryan Duxbury added a comment - Don't see that failure in my test suite because the TestEmptyMetaInfo test doesn't exist yet. HBASE-27 hasn't been applied, right?
        Hide
        stack stack added a comment -

        Right. My test bed was polluted w/ HBASE-27.

        Show
        stack stack added a comment - Right. My test bed was polluted w/ HBASE-27 .
        Hide
        bryanduxbury Bryan Duxbury added a comment -

        Here's the same thing but for trunk.

        Show
        bryanduxbury Bryan Duxbury added a comment - Here's the same thing but for trunk.
        Hide
        bryanduxbury Bryan Duxbury added a comment -

        Please review.

        Show
        bryanduxbury Bryan Duxbury added a comment - Please review.
        Hide
        jimk Jim Kellerman added a comment -

        That is really odd that TestEmptyMetaInfo should fail. Basically what it does is open the META table, stick in a bunch of rows that don't have info:regioninfo in them and wait for the master to clean them up.

        Show
        jimk Jim Kellerman added a comment - That is really odd that TestEmptyMetaInfo should fail. Basically what it does is open the META table, stick in a bunch of rows that don't have info:regioninfo in them and wait for the master to clean them up.
        Hide
        bryanduxbury Bryan Duxbury added a comment -

        I ran TestEmptyMetaInfo against this patch now that it's committed, and it works fine.

        Show
        bryanduxbury Bryan Duxbury added a comment - I ran TestEmptyMetaInfo against this patch now that it's committed, and it works fine.
        Hide
        bryanduxbury Bryan Duxbury added a comment -

        I just committed this to 0.1 and trunk.

        Show
        bryanduxbury Bryan Duxbury added a comment - I just committed this to 0.1 and trunk.

          People

          • Assignee:
            bryanduxbury Bryan Duxbury
            Reporter:
            bryanduxbury Bryan Duxbury
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development