Derby
  1. Derby
  2. DERBY-5097

testMTSelect(org.apache.derbyTesting.functionTests.tests.store.AutomaticIndex StatisticsMultiTest)junit.framework.AssertionFailedError: failed to get statisti cs for table MTSEL (#expected=2, timeout=0) on AIX IBM JDK 1.5

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 10.8.2.2, 10.9.1.0
    • Component/s: Services
    • Labels:
      None
    • Environment:
    • Bug behavior facts:
      Regression Test Failure

      Description

      I need to check if this is intermittent, but noticed the following failure running suites.All on AIX with IBM 1.5.

      1) testMTSelect(org.apache.derbyTesting.functionTests.tests.store.AutomaticIndex
      StatisticsMultiTest)junit.framework.AssertionFailedError: failed to get statisti
      cs for table MTSEL (#expected=2, timeout=0)
      Index statistics for MTSEL
      : <no stats>
      expected:<2> but was:<0>
      at org.apache.derbyTesting.junit.IndexStatsUtil.getStatsTable(IndexStats
      Util.java:236)
      at org.apache.derbyTesting.functionTests.tests.store.AutomaticIndexStati
      sticsMultiTest.verifyStatistics(AutomaticIndexStatisticsMultiTest.java:143)
      at org.apache.derbyTesting.functionTests.tests.store.AutomaticIndexStati
      sticsMultiTest.testMTSelect(AutomaticIndexStatisticsMultiTest.java:87)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
      java:79)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
      sorImpl.java:43)
      at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:
      112)

      FAILURES!!!
      Tests run: 11518, Failures: 1, Errors: 0

      1. derby-5097-1a-add_timeout.diff
        1 kB
        Kristian Waagan
      2. derby-5097-2a-increase_timeout.diff
        0.9 kB
        Kristian Waagan

        Issue Links

          Activity

          Hide
          Rick Hillegas added a comment -

          Hi Kathey, does this test fail reliably on AIX? The test may be vulnerable to timing problems--it is suspicious that the test cranks up 5 threads.

          This error may flag a bug in the automatic collection of statistics, however. The test fails because it is expects that statistics were collected automatically and in this case they weren't.

          It seems to me that this bug is not a regression: the failure to collect statistics will not cause wrong results and it will not even cause a query to run worse than it did in 10.7. This may just be evidence that in some situations istat is not delivering its promised benefits in the expected timeframe.

          Show
          Rick Hillegas added a comment - Hi Kathey, does this test fail reliably on AIX? The test may be vulnerable to timing problems--it is suspicious that the test cranks up 5 threads. This error may flag a bug in the automatic collection of statistics, however. The test fails because it is expects that statistics were collected automatically and in this case they weren't. It seems to me that this bug is not a regression: the failure to collect statistics will not cause wrong results and it will not even cause a query to run worse than it did in 10.7. This may just be evidence that in some situations istat is not delivering its promised benefits in the expected timeframe.
          Hide
          Kathey Marsden added a comment -

          It is intermittent but I saw it two out of four runs, so could probably get it to pop if you would like me to add some diagnostic code to narrow down the problem. Probably best just to check in some verbosity with derby.tests.debug=true and then I can run it with that.

          I agree it is not a product regression as statistics collection is a new feature. As with all test failures though it would be good to resolve before the platform tests kick off if anyone has ideas as trying to map the failures can be a chore when there are a lot of failures and a lot of platforms.

          Show
          Kathey Marsden added a comment - It is intermittent but I saw it two out of four runs, so could probably get it to pop if you would like me to add some diagnostic code to narrow down the problem. Probably best just to check in some verbosity with derby.tests.debug=true and then I can run it with that. I agree it is not a product regression as statistics collection is a new feature. As with all test failures though it would be good to resolve before the platform tests kick off if anyone has ideas as trying to map the failures can be a chore when there are a lot of failures and a lot of platforms.
          Hide
          Kristian Waagan added a comment -

          Haven't spent much time on this, but if this is easily reproducible it may be better to simply use a non-zero timeout value for the utility class that fetches/verifies the statistics (see line 142 in the test class).
          As Rick suggests, it may very well be that the background thread is never allowed to run (it is supposed to be started immediately) due to differences in the OS scheduling policy. It would also be good to enable logging/tracing and compare the output between a working and a failing run.

          Show
          Kristian Waagan added a comment - Haven't spent much time on this, but if this is easily reproducible it may be better to simply use a non-zero timeout value for the utility class that fetches/verifies the statistics (see line 142 in the test class). As Rick suggests, it may very well be that the background thread is never allowed to run (it is supposed to be started immediately) due to differences in the OS scheduling policy. It would also be good to enable logging/tracing and compare the output between a working and a failing run.
          Hide
          Myrna van Lunteren added a comment -

          I thought I'd check how hard this is to reproduce, and with ibm 1.6 (SR9 FP1), against 10.8.1.2 release, on AIX this test failed with this error message 11 out of 100 times.

          Show
          Myrna van Lunteren added a comment - I thought I'd check how hard this is to reproduce, and with ibm 1.6 (SR9 FP1), against 10.8.1.2 release, on AIX this test failed with this error message 11 out of 100 times.
          Hide
          Kristian Waagan added a comment -

          I can't reproduce on my machine with a sane build at least.
          Did you use sane or insane?
          How many cpus/cores on the machine?

          If possible, can you add a timeout value to the IndexStatsUtil instance on line 142? For the sake of ruling out "delayed istats", the timeout value can be rather high (ten seconds?).

          Show
          Kristian Waagan added a comment - I can't reproduce on my machine with a sane build at least. Did you use sane or insane? How many cpus/cores on the machine? If possible, can you add a timeout value to the IndexStatsUtil instance on line 142? For the sake of ruling out "delayed istats", the timeout value can be rather high (ten seconds?).
          Hide
          Myrna van Lunteren added a comment -

          My previous result was with 10.8.1.2, insane.
          The machine has 1 core, 1 cpu.

          I then tried an sane trunk build with the same jvm, same machine, unmodified, and it failed 77 out of 100 times.

          Then I modified line 142 as you said, with a timeout of 10 seconds, and there were 0 failures out of 100 runs...

          Show
          Myrna van Lunteren added a comment - My previous result was with 10.8.1.2, insane. The machine has 1 core, 1 cpu. I then tried an sane trunk build with the same jvm, same machine, unmodified, and it failed 77 out of 100 times. Then I modified line 142 as you said, with a timeout of 10 seconds, and there were 0 failures out of 100 runs...
          Hide
          Kristian Waagan added a comment -

          Attaching patch 1a, which adds the timeout that seems to stop the test from failing.

          Committed to trunk with revision 1131030.
          Waiting for verification, then I'll back-port it to 10.8.

          Show
          Kristian Waagan added a comment - Attaching patch 1a, which adds the timeout that seems to stop the test from failing. Committed to trunk with revision 1131030. Waiting for verification, then I'll back-port it to 10.8.
          Hide
          Myrna van Lunteren added a comment -

          I sync-ed up and ran the test 100 times both with insane and sane jars; insane had no failures, sane 1.
          That would work for me, and I think I'm the only one running regular (but not nightly) tests on this platform, and never with sane jars.
          +1 for the backport.

          Show
          Myrna van Lunteren added a comment - I sync-ed up and ran the test 100 times both with insane and sane jars; insane had no failures, sane 1. That would work for me, and I think I'm the only one running regular (but not nightly) tests on this platform, and never with sane jars. +1 for the backport.
          Hide
          Kristian Waagan added a comment -

          Thanks, Myrna

          I'll up the time the test will wait, it will retry every 250 ms so a higher value shouldn't matter. I'm thinking of setting it to five seconds, and then backport.

          Show
          Kristian Waagan added a comment - Thanks, Myrna I'll up the time the test will wait, it will retry every 250 ms so a higher value shouldn't matter. I'm thinking of setting it to five seconds, and then backport.
          Hide
          Myrna van Lunteren added a comment -

          ok...

          Show
          Myrna van Lunteren added a comment - ok...
          Hide
          Kristian Waagan added a comment -

          Increased timeout with patch 2a.
          Committed to trunk with revision 1133741.

          Show
          Kristian Waagan added a comment - Increased timeout with patch 2a. Committed to trunk with revision 1133741.
          Hide
          Kristian Waagan added a comment -

          Backported fix to 10.8 with revision 1133747.
          Ready for verification/close.

          Show
          Kristian Waagan added a comment - Backported fix to 10.8 with revision 1133747. Ready for verification/close.
          Hide
          Myrna van Lunteren added a comment -

          ran once more 100 times with sane trunk jars, and no failures. Thanks Kristian.

          Show
          Myrna van Lunteren added a comment - ran once more 100 times with sane trunk jars, and no failures. Thanks Kristian.

            People

            • Assignee:
              Kristian Waagan
              Reporter:
              Kathey Marsden
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development