Derby
  1. Derby
  2. DERBY-4982

Retrying after interrupts in store pops a bug in derbyall/storeall/storeunit/T_RawStoreFactory in some cases

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.8.1.2
    • Fix Version/s: 10.8.1.2
    • Component/s: Store
    • Labels:
      None

      Description

      Cf Myrna's comment on DERBY-4741:

      "I think the latest check-in has caused the following tinderbox failure:

      derbyall/storeall/storeall.fail:unit/T_RawStoreFactory.unit

      see: http://dbtg.foundry.sun.com/derby/test/tinderbox_trunk16/jvm1.6/testing/testlog/SunOS-5.10_i86pc-i386/1061516-derbyall_diff.txt:

                      • Diff file derbyall/storeall/storeunit/T_RawStoreFactory.diff
          • Start: T_RawStoreFactory jdk1.6.0_18 storeall:storeunit 2011-01-20 23:22:23 ***
            2 del
            < – Unit Test T_RawStoreFactory finished
            2 add
            > ran out of time
            > Exit due to time bomb
            Test Failed.
          • End: T_RawStoreFactory jdk1.6.0_18 storeall:storeunit 2011-01-21 00:22:54 ***
            "
            It failed in the nightly runs with ibm 1.6 also (and 1.4.2 and 1.5).
      1. derby-4982.diff
        4 kB
        Dag H. Wanvik
      2. derby-4982.stat
        0.3 kB
        Dag H. Wanvik
      3. derby-4982b.diff
        4 kB
        Dag H. Wanvik
      4. derby-4982b.stat
        0.3 kB
        Dag H. Wanvik
      5. derby-4982c.diff
        4 kB
        Dag H. Wanvik

        Issue Links

          Activity

          Hide
          Knut Anders Hatlen added a comment -

          [bulk update] Close all resolved issues that haven't been updated for more than one year.

          Show
          Knut Anders Hatlen added a comment - [bulk update] Close all resolved issues that haven't been updated for more than one year.
          Hide
          Dag H. Wanvik added a comment -

          Committed as svn 1063960, resolving. storeall should now we back in shape in the dailies.

          Show
          Dag H. Wanvik added a comment - Committed as svn 1063960, resolving. storeall should now we back in shape in the dailies.
          Hide
          Dag H. Wanvik added a comment -

          Uploading patch "c" which fixes a bad cut & paste job: the type of message argument is wrong in messages.xml.

          Show
          Dag H. Wanvik added a comment - Uploading patch "c" which fixes a bad cut & paste job: the type of message argument is wrong in messages.xml.
          Hide
          Knut Anders Hatlen added a comment -

          Thanks, Dag! The "b" patch looks good. +1

          Show
          Knut Anders Hatlen added a comment - Thanks, Dag! The "b" patch looks good. +1
          Hide
          Dag H. Wanvik added a comment -

          Uploading revision "b", which adds a more verbose comment, and inserts the
          missing parameter. I reran storeunit, ok.

          Show
          Dag H. Wanvik added a comment - Uploading revision "b", which adds a more verbose comment, and inserts the missing parameter. I reran storeunit, ok.
          Hide
          Dag H. Wanvik added a comment -

          Thanks, Knut.
          Regressions rans fine in insane mode. I'll expand on the comment and fix the error parameter.

          Show
          Dag H. Wanvik added a comment - Thanks, Knut. Regressions rans fine in insane mode. I'll expand on the comment and fix the error parameter.
          Hide
          Knut Anders Hatlen added a comment -

          I think failing in both sane and insane is fine when we attempt to double-latch a page. That's what this code did before DERBY-2107, only difference was that it would be a deadlock exception from the lock manager after a 20 second delay. This should never happen unless we have a bug somewhere, so what we chose to do is mainly a question about how keep the internal API testable. And if we should have a bug that causes double-latching, a message telling that that's what happened is probably more helpful than a hang.

          The comment in BasePage.setExclusive() doesn't say that this is an attempt to double-latch a page anymore. Perhaps expand on the comment to make it clearer why we error out?

          The new message takes a parameter, but no value is provided in the call to newException().

          Show
          Knut Anders Hatlen added a comment - I think failing in both sane and insane is fine when we attempt to double-latch a page. That's what this code did before DERBY-2107 , only difference was that it would be a deadlock exception from the lock manager after a 20 second delay. This should never happen unless we have a bug somewhere, so what we chose to do is mainly a question about how keep the internal API testable. And if we should have a bug that causes double-latching, a message telling that that's what happened is probably more helpful than a hang. The comment in BasePage.setExclusive() doesn't say that this is an attempt to double-latch a page anymore. Perhaps expand on the comment to make it clearer why we error out? The new message takes a parameter, but no value is provided in the call to newException().
          Hide
          Dag H. Wanvik added a comment - - edited

          Uploading a patch which makes store throw a session level exception (the cleanup will possibly release the latch if held by this transaction, not sure) for both sane and insane, instead of assert for sane, and in stead of waiting eternally fo insane, In stead of 08000, I made a new error, DATA_DOUBLE_LATCH_INTERNAL_ERROR for this situation.I chose not to make it database level since the the thread is only dead-locked with itself and we have no sign that the database is compromised. Maybe it is overkill to make a new error for this internal error which is not expected to happen, but according to Knut, waiting for lock timeout is no longer an option: we get an eternal hang. Opinions are welcome as always.

          The test code has been adjusted accordingly.

          Running regressions.

          Show
          Dag H. Wanvik added a comment - - edited Uploading a patch which makes store throw a session level exception (the cleanup will possibly release the latch if held by this transaction, not sure) for both sane and insane, instead of assert for sane, and in stead of waiting eternally fo insane, In stead of 08000, I made a new error, DATA_DOUBLE_LATCH_INTERNAL_ERROR for this situation.I chose not to make it database level since the the thread is only dead-locked with itself and we have no sign that the database is compromised. Maybe it is overkill to make a new error for this internal error which is not expected to happen, but according to Knut, waiting for lock timeout is no longer an option: we get an eternal hang. Opinions are welcome as always. The test code has been adjusted accordingly. Running regressions.
          Hide
          Dag H. Wanvik added a comment - - edited

          This error is due to the way the method T_Unit#checkGetLatchedPage works. This method calls into the internal Derby Page classes to test latching of pages. The structure of this test code mirrors how the code in BasePage#setExclusive works in the following scenario:

          The Derby store code is wrong internally somehow, so an attempt is made to grab the same latch twice. This condition is tested for by checkGetLatchedPage and it treats it differently in sane and insane mode as does store. BasePage#setExclusive does the following:

          • in sane mode, an assert error is thrown
          • in insane mode, the code will do a wait on the monitor and rely on lock time-out to break the stalemate (waiting for itself...)

          Now, the test T_RawStoreFactory#P024 calls T_Unit#checkGetLatchedPage. The latter "knows" that in insane mode, a hang can be expected when it tries
          to acquire a lack for the second time, so to get out of this, it fires off a thread which will shoot down the hanging user thread with an interrupt, so as to get control back: 08000 is expected.

          With the changes in DERBY-4741, interrupting the test thread no longer works, since the wait in setExclusive just retries. This means that in insane mode, the tests hangs until the harness' time bomb goes off. I ran my regression tests in sane mode, which wasn't affected by this logic, and was thus blind-sided.

          Show
          Dag H. Wanvik added a comment - - edited This error is due to the way the method T_Unit#checkGetLatchedPage works. This method calls into the internal Derby Page classes to test latching of pages. The structure of this test code mirrors how the code in BasePage#setExclusive works in the following scenario: The Derby store code is wrong internally somehow, so an attempt is made to grab the same latch twice. This condition is tested for by checkGetLatchedPage and it treats it differently in sane and insane mode as does store. BasePage#setExclusive does the following: in sane mode, an assert error is thrown in insane mode, the code will do a wait on the monitor and rely on lock time-out to break the stalemate (waiting for itself...) Now, the test T_RawStoreFactory#P024 calls T_Unit#checkGetLatchedPage. The latter "knows" that in insane mode, a hang can be expected when it tries to acquire a lack for the second time, so to get out of this, it fires off a thread which will shoot down the hanging user thread with an interrupt, so as to get control back: 08000 is expected. With the changes in DERBY-4741 , interrupting the test thread no longer works, since the wait in setExclusive just retries. This means that in insane mode, the tests hangs until the harness' time bomb goes off. I ran my regression tests in sane mode, which wasn't affected by this logic, and was thus blind-sided.

            People

            • Assignee:
              Dag H. Wanvik
              Reporter:
              Dag H. Wanvik
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development