HBase
  1. HBase
  2. HBASE-506

When an exception has to escape ServerCallable due to exhausted retries, show all the exceptions that lead to this situation

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.1.0, 0.2.0
    • Component/s: Client
    • Labels:
      None

      Description

      Every so often, we find ourselves trying to debug a problem that happens in HTable where we exhaust all our retries trying to contact the region server hosting the region we want to operate on. Oftentimes the last exception that comes out is something like WrongRegionException, which should just never be the case.

      As a way to improve our debugging capabilities, when we decide to throw an exception out of ServerCallable, let's show not just the last exception but all the exceptions that caused all the retries in the first place. This will help us understand the sequence of events that led to us running out of retries.

      1. 506.patch
        2 kB
        Bryan Duxbury
      2. 506-0.1.patch
        2 kB
        Bryan Duxbury

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        16h 52m 1 Bryan Duxbury 12/Mar/08 17:13
        Patch Available Patch Available Resolved Resolved
        3d 4h 14m 1 Bryan Duxbury 15/Mar/08 21:27
        Resolved Resolved Closed Closed
        159d 23h 45m 1 Jim Kellerman 22/Aug/08 22:13
        Jim Kellerman made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Bryan Duxbury made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Bryan Duxbury added a comment -

        I just committed this to 0.1 and trunk.

        Show
        Bryan Duxbury added a comment - I just committed this to 0.1 and trunk.
        Hide
        Bryan Duxbury added a comment -

        I ran TestEmptyMetaInfo against this patch now that it's committed, and it works fine.

        Show
        Bryan Duxbury added a comment - I ran TestEmptyMetaInfo against this patch now that it's committed, and it works fine.
        Hide
        Jim Kellerman added a comment -

        That is really odd that TestEmptyMetaInfo should fail. Basically what it does is open the META table, stick in a bunch of rows that don't have info:regioninfo in them and wait for the master to clean them up.

        Show
        Jim Kellerman added a comment - That is really odd that TestEmptyMetaInfo should fail. Basically what it does is open the META table, stick in a bunch of rows that don't have info:regioninfo in them and wait for the master to clean them up.
        Bryan Duxbury made changes -
        Assignee Bryan Duxbury [ bryanduxbury ]
        Bryan Duxbury made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Bryan Duxbury added a comment -

        Please review.

        Show
        Bryan Duxbury added a comment - Please review.
        Bryan Duxbury made changes -
        Attachment 506.patch [ 12377710 ]
        Hide
        Bryan Duxbury added a comment -

        Here's the same thing but for trunk.

        Show
        Bryan Duxbury added a comment - Here's the same thing but for trunk.
        Hide
        stack added a comment -

        Right. My test bed was polluted w/ HBASE-27.

        Show
        stack added a comment - Right. My test bed was polluted w/ HBASE-27 .
        Hide
        Bryan Duxbury added a comment -

        Don't see that failure in my test suite because the TestEmptyMetaInfo test doesn't exist yet. HBASE-27 hasn't been applied, right?

        Show
        Bryan Duxbury added a comment - Don't see that failure in my test suite because the TestEmptyMetaInfo test doesn't exist yet. HBASE-27 hasn't been applied, right?
        Hide
        stack added a comment -

        I saw below running tests on patch:

        [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 72.628 sec
        [junit] Test org.apache.hadoop.hbase.TestEmptyMetaInfo FAILED

        Do you see same?

        Show
        stack added a comment - I saw below running tests on patch: [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 72.628 sec [junit] Test org.apache.hadoop.hbase.TestEmptyMetaInfo FAILED Do you see same?
        Hide
        stack added a comment -

        +1 on patch. Falls into the debugging tools category – will help prove/disprove IRC theory that WREs just happen to be the last of a set of retryes – so fine to apply to 0.1 branch.

        Show
        stack added a comment - +1 on patch. Falls into the debugging tools category – will help prove/disprove IRC theory that WREs just happen to be the last of a set of retryes – so fine to apply to 0.1 branch.
        Bryan Duxbury made changes -
        Field Original Value New Value
        Attachment 506-0.1.patch [ 12377657 ]
        Hide
        Bryan Duxbury added a comment -

        Here's a patch to add this functionality for 0.1. (Nearly the same patch would apply to trunk, but let's see if this works first.)

        Show
        Bryan Duxbury added a comment - Here's a patch to add this functionality for 0.1. (Nearly the same patch would apply to trunk, but let's see if this works first.)
        Bryan Duxbury created issue -

          People

          • Assignee:
            Bryan Duxbury
            Reporter:
            Bryan Duxbury
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development