Derby
  1. Derby
  2. DERBY-4323

test failure in lang.ErrorMessageTest with IBM iseries IBM 1.5

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 10.5.2.0
    • Fix Version/s: 10.10.1.1
    • Component/s: Test
    • Environment:
      IBM iseries, OS: AS/400; OS version: V5R4M0, IBM 1.5 (1.5.0_13-b05)
    • Urgency:
      Normal
    • Bug behavior facts:
      Regression Test Failure

      Description

      There was this error during the 10.5.2.0 test run - when the ErrorMessageTest was run by itself later there was no problem, and neither did this show up during the ibm 1.6 run:

      1) testDeadlockTimeout(org.apache.derbyTesting.functionTests.tests.lang.ErrorMessageTest)junit.framework.ComparisonFailure: Not a deadlock expected:<...001> but was:<...XL2>

      1. error-stacktrace.out
        11 kB
        Myrna van Lunteren
      2. ErrorMessageTest_derby.log
        914 kB
        Myrna van Lunteren
      3. d4323-1a.diff
        12 kB
        Knut Anders Hatlen

        Issue Links

          Activity

          Myrna van Lunteren created issue -
          Hide
          Myrna van Lunteren added a comment -

          derby.log output and error stack trace from the fail directory.

          Show
          Myrna van Lunteren added a comment - derby.log output and error stack trace from the fail directory.
          Myrna van Lunteren made changes -
          Field Original Value New Value
          Attachment ErrorMessageTest_derby.log [ 12414746 ]
          Attachment error-stacktrace.out [ 12414747 ]
          Hide
          Knut Anders Hatlen added a comment -

          The log shows that a timeout exception was thrown instead of a deadlock exception. The lock table does however show that there is a deadlock, so either the deadlock detection algorithm must have failed to detect it, or the deadlock didn't happen until after the deadlock detection algorithm had completed. My guess is on the latter. If it happens again, we could try to increase the deadlock timeout in the test (currently two seconds) to give it more time to get into the expected state before deadlock detection is performed.

          Show
          Knut Anders Hatlen added a comment - The log shows that a timeout exception was thrown instead of a deadlock exception. The lock table does however show that there is a deadlock, so either the deadlock detection algorithm must have failed to detect it, or the deadlock didn't happen until after the deadlock detection algorithm had completed. My guess is on the latter. If it happens again, we could try to increase the deadlock timeout in the test (currently two seconds) to give it more time to get into the expected state before deadlock detection is performed.
          Knut Anders Hatlen made changes -
          Component/s Test [ 11413 ]
          Hide
          Myrna van Lunteren added a comment -

          I saw this again while testing 10.7.1.0 and again with 10.7.1.1 - but both times only with ibm 1.6, not ibm 1.5. Note that it didn't fail during the 10.6.2.0 nor 10.6.2.1 test runs, using the same os & jvm versions.
          In short, it seems somewhat intermittent.

          I am hesitant to increase the deadlocktimeout in the test, wouldn't that make the test slow down always?
          On the other hand, it's bothersome to have another intermittent failure if there's no bug, and I also don't want to add yet another optional flag. I'll experiment modifying the test to force the failure, then decide...

          Show
          Myrna van Lunteren added a comment - I saw this again while testing 10.7.1.0 and again with 10.7.1.1 - but both times only with ibm 1.6, not ibm 1.5. Note that it didn't fail during the 10.6.2.0 nor 10.6.2.1 test runs, using the same os & jvm versions. In short, it seems somewhat intermittent. I am hesitant to increase the deadlocktimeout in the test, wouldn't that make the test slow down always? On the other hand, it's bothersome to have another intermittent failure if there's no bug, and I also don't want to add yet another optional flag. I'll experiment modifying the test to force the failure, then decide...
          Myrna van Lunteren made changes -
          Labels derby_triage10_8
          Urgency Normal
          Hide
          Knut Anders Hatlen added a comment -

          I saw this now with Java SE Embedded 7 running on a 720MHz ARM processor (timing issues are to be expected, as suites.All takes 14 hours to complete on this platform).

          I think it should be possible to reduce the risk of these timing problems without increasing the deadlock timeout. For example, we could compile both of the SELECT statements that we set up for deadlock, before we execute them. This way, the time it takes to compile the queries doesn't eat up parts of the two-second time slot we have available for getting the statements to execute and reach the point where they lock the row. I'll give it a try.

          Show
          Knut Anders Hatlen added a comment - I saw this now with Java SE Embedded 7 running on a 720MHz ARM processor (timing issues are to be expected, as suites.All takes 14 hours to complete on this platform). I think it should be possible to reduce the risk of these timing problems without increasing the deadlock timeout. For example, we could compile both of the SELECT statements that we set up for deadlock, before we execute them. This way, the time it takes to compile the queries doesn't eat up parts of the two-second time slot we have available for getting the statements to execute and reach the point where they lock the row. I'll give it a try.
          Knut Anders Hatlen made changes -
          Assignee Knut Anders Hatlen [ knutanders ]
          Knut Anders Hatlen made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          Hide
          Knut Anders Hatlen added a comment -

          I haven't been able to reproduce the failure, but here's a patch that I'm planning to commit, and that I expect to reduce the risk of this happening again.

          Currently, the test has a race between two threads that perform the following steps:

          T1_1: compile query
          T1_2: start executing query
          T1_3: lock row

          T2_1: start new thread
          T2_2: compile query
          T2_3: start executing query
          T2_4: lock row

          The events T1_3 and T2_4 must happen no more than 2 seconds apart.

          The patch makes sure that both of the two threads have completed the query compilation before the race starts. Since there are less steps to be done concurrently, and also the amount of work done concurrently is more evenly distributed, the chance of a big time difference in the last step should be smaller.

          It does this by splitting up Statement.executeQuery() into prepareStatement() + executeQuery() and introducing a barrier before the executeQuery() calls to make sure they start at the same time.

          I factored out the already existing Barrier classes from IndexSplitDeadlockTest and DeadlockDetectionTest for this purpose (since we cannot use java.util.concurrent.CyclicBarrier as long as we support platforms less capable than Java 5).

          Show
          Knut Anders Hatlen added a comment - I haven't been able to reproduce the failure, but here's a patch that I'm planning to commit, and that I expect to reduce the risk of this happening again. Currently, the test has a race between two threads that perform the following steps: T1_1: compile query T1_2: start executing query T1_3: lock row T2_1: start new thread T2_2: compile query T2_3: start executing query T2_4: lock row The events T1_3 and T2_4 must happen no more than 2 seconds apart. The patch makes sure that both of the two threads have completed the query compilation before the race starts. Since there are less steps to be done concurrently, and also the amount of work done concurrently is more evenly distributed, the chance of a big time difference in the last step should be smaller. It does this by splitting up Statement.executeQuery() into prepareStatement() + executeQuery() and introducing a barrier before the executeQuery() calls to make sure they start at the same time. I factored out the already existing Barrier classes from IndexSplitDeadlockTest and DeadlockDetectionTest for this purpose (since we cannot use java.util.concurrent.CyclicBarrier as long as we support platforms less capable than Java 5).
          Knut Anders Hatlen made changes -
          Attachment d4323-1a.diff [ 12550757 ]
          Hide
          Knut Anders Hatlen added a comment -

          Committed revision 1402043.

          It's difficult to say if this is enough to fix the problem, since it's so hard to reproduce. Should we resolve for now and reopen/file a new report if it happens again?

          Show
          Knut Anders Hatlen added a comment - Committed revision 1402043. It's difficult to say if this is enough to fix the problem, since it's so hard to reproduce. Should we resolve for now and reopen/file a new report if it happens again?
          Hide
          Myrna van Lunteren added a comment - - edited

          Thank you for looking into this failure!
          I did not see this failure in the testing for any more recent releases on this OS (i.e. ibm iseries - not for any 10.8 release (or candidate), nor 10.9). I think this coincides with a change in hardware and OS version. I can't remember having seen this anywhere else (I usually add a comment in an open test issue if I see it again, and I have not done that for this one, so it's likely I didn't see it again).
          I think closing is appropriate, and if this occurs again, we can reopen or open a new issue.
          I'm resolving instead of closing, in case we want to backport this.

          Show
          Myrna van Lunteren added a comment - - edited Thank you for looking into this failure! I did not see this failure in the testing for any more recent releases on this OS (i.e. ibm iseries - not for any 10.8 release (or candidate), nor 10.9). I think this coincides with a change in hardware and OS version. I can't remember having seen this anywhere else (I usually add a comment in an open test issue if I see it again, and I have not done that for this one, so it's likely I didn't see it again). I think closing is appropriate, and if this occurs again, we can reopen or open a new issue. I'm resolving instead of closing, in case we want to backport this.
          Myrna van Lunteren made changes -
          Status In Progress [ 3 ] Resolved [ 5 ]
          Fix Version/s 10.10.0.0 [ 12321550 ]
          Resolution Fixed [ 1 ]
          Knut Anders Hatlen made changes -
          Link This issue is related to DERBY-6001 [ DERBY-6001 ]
          Gavin made changes -
          Workflow jira [ 12471731 ] Default workflow, editable Closed status [ 12802077 ]

            People

            • Assignee:
              Knut Anders Hatlen
              Reporter:
              Myrna van Lunteren
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development