Derby
  1. Derby
  2. DERBY-3819

'Expected Table Scan ResultSet for T3' in 'test_predicatePushdown(....PredicatePushdownTest)' since 670215 2008-06-21 18:01:08 MEST

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 10.5.1.1
    • Fix Version/s: None
    • Component/s: Test
    • Environment:
    • Urgency:
      Normal
    • Bug behavior facts:
      Regression Test Failure

      Description

      'test_predicatePushdown(org.apache.derbyTesting.functionTests.tests.lang.PredicatePushdownTest)junit.framework.AssertionFailedError: Expected Table Scan ResultSet for T3' since 670215 2008-06-21 18:01:08 MEST http://dbtg.thresher.com/derby/test/Daily/UpdateInfo/670215.txt

      The failure is seen on SunOS 5.10 / Sun Jvm 1.6.0.

      See e.g. http://dbtg.thresher.com/derby/test/Daily/jvm1.6/testing/testlog/sol/682186-suitesAll_diff.txt

      The test (suites.All) is run with '-XX:-UseThreadPriorities -XX:MaxPermSize=128M -Xmx256M -d64'.
      When run with '-XX:MaxPermSize=128M -Xmx256M' as is used for the other platforms in this set of tests we do not see a failure.

      The failure was also seen on Solaris Express Community Edition snv_86 X86bits - SunOS 5.11 snv_86 (solN+1) between 670215 and 676638.
      (Run w/ -XX:-UseThreadPriorities -XX:MaxPermSize=128M -Xmx256M -d32)

      1. old_plan.txt
        11 kB
        Kathey Marsden
      2. new_plan.txt
        11 kB
        Kathey Marsden
      3. DERBY-3819.diff_64bitskipasserts
        4 kB
        Myrna van Lunteren
      4. DERBY-3819.diff_64bitskipasserts2
        4 kB
        Myrna van Lunteren

        Activity

        Hide
        Myrna van Lunteren added a comment -

        Committed the patch 2 with revision 807733 to trunk and backported to 10.5 with revision 807749.

        I'll leave this open so someone else can choose to open a new issue for the optimizer directives changes or use this one.

        When the underlying problems are fixed, the asserts need to be reactivated - look in the test for DERBY-3819 references and remove the 'if's.

        Show
        Myrna van Lunteren added a comment - Committed the patch 2 with revision 807733 to trunk and backported to 10.5 with revision 807749. I'll leave this open so someone else can choose to open a new issue for the optimizer directives changes or use this one. When the underlying problems are fixed, the asserts need to be reactivated - look in the test for DERBY-3819 references and remove the 'if's.
        Hide
        Myrna van Lunteren added a comment -

        Thanks, Knut for that suggestion. Uploading a version of that patch where I've used BaseTestCase.traceit instead of duplicating the code in that method.

        I plan to commit this shortly.

        Show
        Myrna van Lunteren added a comment - Thanks, Knut for that suggestion. Uploading a version of that patch where I've used BaseTestCase.traceit instead of duplicating the code in that method. I plan to commit this shortly.
        Hide
        Knut Anders Hatlen added a comment -

        The workaround looks fine to me. Perhaps you'd want to use BaseTestCase.traceit() instead of if (...doTrace())

        { System.out.println(...); }

        so that redirection of the output works.

        Show
        Knut Anders Hatlen added a comment - The workaround looks fine to me. Perhaps you'd want to use BaseTestCase.traceit() instead of if (...doTrace()) { System.out.println(...); } so that redirection of the output works.
        Hide
        Myrna van Lunteren added a comment -

        I'm attaching a patch that will not run the offending asserts (after skipping the one reported, I found there were a few more) on 64 bit platforms.

        I didn't find another test that had this kind of exception 'skipping' code, but I used the same logic as was used in org.apache.derby.iapi.services.cache.ClassSize. Even though there's a slight duplication of code this way I thought it better to keep the testing stuff separate.

        Patch is ready for review.
        I tested by running only the test on 64 bit linux (with 64-bit ibm16) and 32 bit windows xp (with 32 bit ibm16).
        I think this is harmless enough for a workaround, and I'd like to commit tomorrow if there is no feedback...

        After, perhaps this issue should get closed and a new issue opened to adjust the test using optimizer directives and further research to what's really the correct/intended behavior.

        Show
        Myrna van Lunteren added a comment - I'm attaching a patch that will not run the offending asserts (after skipping the one reported, I found there were a few more) on 64 bit platforms. I didn't find another test that had this kind of exception 'skipping' code, but I used the same logic as was used in org.apache.derby.iapi.services.cache.ClassSize. Even though there's a slight duplication of code this way I thought it better to keep the testing stuff separate. Patch is ready for review. I tested by running only the test on 64 bit linux (with 64-bit ibm16) and 32 bit windows xp (with 32 bit ibm16). I think this is harmless enough for a workaround, and I'd like to commit tomorrow if there is no feedback... After, perhaps this issue should get closed and a new issue opened to adjust the test using optimizer directives and further research to what's really the correct/intended behavior.
        Hide
        Dag H. Wanvik added a comment -

        +1 to comment out for now on 64 bits platforms for now.

        I think we could change this test to use the optimizer directives. I agree with Knut that this test should be testing that pushdown happens, not what join strategy is selected. Hopefully other tests cover that...

        Show
        Dag H. Wanvik added a comment - +1 to comment out for now on 64 bits platforms for now. I think we could change this test to use the optimizer directives. I agree with Knut that this test should be testing that pushdown happens, not what join strategy is selected. Hopefully other tests cover that...
        Hide
        Kathey Marsden added a comment -

        I wonder if it would be acceptable to disable this test on 64 bit platforms until we have a fix with a comment that it should be enabled when this issue is fixed. Currently I think it always fails on 64 bit so adds noise to the test runs.

        Show
        Kathey Marsden added a comment - I wonder if it would be acceptable to disable this test on 64 bit platforms until we have a fix with a comment that it should be enabled when this issue is fixed. Currently I think it always fails on 64 bit so adds noise to the test runs.
        Hide
        Ole Solberg added a comment -

        This is seen in our release tests for 10.5.1.0 on 9 (of 13) platforms on 64bit os and 64bit jvms.

        See

        http://dbtg.thresher.com/derby/test/10.5.1.0_RC/jvm1.6-64/testing/Limited/testSummary-757599.html

        http://dbtg.thresher.com/derby/test/10.5.1.0_RC/jvm1.6-64/FailReports/757599_bySig.html

        (I have rerun these a couple of times: also see .../testSummary-757599-2.html, ..../757599_bySig-2.html etc.)

        Show
        Ole Solberg added a comment - This is seen in our release tests for 10.5.1.0 on 9 (of 13) platforms on 64bit os and 64bit jvms. See http://dbtg.thresher.com/derby/test/10.5.1.0_RC/jvm1.6-64/testing/Limited/testSummary-757599.html http://dbtg.thresher.com/derby/test/10.5.1.0_RC/jvm1.6-64/FailReports/757599_bySig.html (I have rerun these a couple of times: also see .../testSummary-757599-2.html, ..../757599_bySig-2.html etc.)
        Hide
        Knut Anders Hatlen added a comment -

        I thought the point of the predicate pushdown test wasn't to verify that the correct join strategy was chosen, but rather to verify that the predicates were correctly pushed down from the outer select statement into the nested select statements. I cannot see that the test does that anymore, though.

        Show
        Knut Anders Hatlen added a comment - I thought the point of the predicate pushdown test wasn't to verify that the correct join strategy was chosen, but rather to verify that the predicates were correctly pushed down from the outer select statement into the nested select statements. I cannot see that the test does that anymore, though.
        Hide
        Kathey Marsden added a comment -

        I started looking at using optimizer directives to get the desired behavior but the approach worries me. What is the value of forcing a hash join just to make sure our check for a hash join passes? Seems a reasonable test for DERBY-PROPERTIES but I don't know that it tests predicate pushdown behavior.

        Perhaps I am way off and it is another directive that I should be using besides joinStrategy. Any thoughts on this would be greatly appreciated.

        Show
        Kathey Marsden added a comment - I started looking at using optimizer directives to get the desired behavior but the approach worries me. What is the value of forcing a hash join just to make sure our check for a hash join passes? Seems a reasonable test for DERBY-PROPERTIES but I don't know that it tests predicate pushdown behavior. Perhaps I am way off and it is another directive that I should be using besides joinStrategy. Any thoughts on this would be greatly appreciated.
        Hide
        Kathey Marsden added a comment -

        Unassigning myself from this issue for now. I don't think I will have time to rewrite the test to use optimizer directives by the time I leave for vacation. I will be out August 16 - September 1.

        Show
        Kathey Marsden added a comment - Unassigning myself from this issue for now. I don't think I will have time to rewrite the test to use optimizer directives by the time I leave for vacation. I will be out August 16 - September 1.
        Hide
        Kristian Waagan added a comment -

        A B wrote:


        But that assumption was wrong, esp. in the case of refSize, which varies from machine to machine (esp. 32-bit vs 64 bit).


        It is my understanding that refSize will be either 4 or 8, depending on whether the VM is 32-bit or 64-bit.
        Assuming 'sun.arch.data.model' can only be 32 or 64, this is enforced by the two first steps in the algorithm:
        a) 'sun.arch.data.model' / 8
        b) 'os.arch'; hardcoded values (4 and 8) if the arch string is known

        Step c), which is the original heuristic, can result in values from 4 and up.
        On my machine it calculated refSize to be -26, but no values smaller than 4 are returned.

        According to Knut Anders (see DERBY-3794), there are other variables used that make the estimates more or less wrong, but they are either constants or based on refSize and we shouldn't see "random values" as long as refSize is stable.

        The refSize calculation changes are documented in DERBY-3731.

        Show
        Kristian Waagan added a comment - A B wrote: But that assumption was wrong, esp. in the case of refSize, which varies from machine to machine (esp. 32-bit vs 64 bit). It is my understanding that refSize will be either 4 or 8, depending on whether the VM is 32-bit or 64-bit. Assuming 'sun.arch.data.model' can only be 32 or 64, this is enforced by the two first steps in the algorithm: a) 'sun.arch.data.model' / 8 b) 'os.arch'; hardcoded values (4 and 8) if the arch string is known Step c), which is the original heuristic, can result in values from 4 and up. On my machine it calculated refSize to be -26, but no values smaller than 4 are returned. According to Knut Anders (see DERBY-3794 ), there are other variables used that make the estimates more or less wrong, but they are either constants or based on refSize and we shouldn't see "random values" as long as refSize is stable. The refSize calculation changes are documented in DERBY-3731 .
        Hide
        A B added a comment -

        > what about using optimizer hints to fix the query plan in the desired shape

        Good point, I think that's the way to go. The original test was committed for 10.1 and we didn't have optimizer hints back then. But for the current trunk, where we now have a separate JUnit test that is not in earlier versions, I think optimizer hints make sense.

        > it would be nice if we didn't have to do this,

        Right, it will probably take a while as that test is quite long. But hopefully the comments preceding each test case will indicate to a sufficient degree what the necessary optimizer overrides would be...hopefully...

        Show
        A B added a comment - > what about using optimizer hints to fix the query plan in the desired shape Good point, I think that's the way to go. The original test was committed for 10.1 and we didn't have optimizer hints back then. But for the current trunk, where we now have a separate JUnit test that is not in earlier versions, I think optimizer hints make sense. > it would be nice if we didn't have to do this, Right, it will probably take a while as that test is quite long. But hopefully the comments preceding each test case will indicate to a sufficient degree what the necessary optimizer overrides would be...hopefully...
        Hide
        Mike Matrigali added a comment -

        what about using optimizer hints to fix the query plan in the desired shape and then check if
        pushdown is happening? The test is about pushdown rather than hash vs scan choices.

        it would be nice if we didn't have to do this, but maybe it would be a reasonable short term fix, and we could log a request to come up with a better test case that would not get different plans on different machines.

        Show
        Mike Matrigali added a comment - what about using optimizer hints to fix the query plan in the desired shape and then check if pushdown is happening? The test is about pushdown rather than hash vs scan choices. it would be nice if we didn't have to do this, but maybe it would be a reasonable short term fix, and we could log a request to come up with a better test case that would not get different plans on different machines.
        Hide
        Kathey Marsden added a comment -

        I wonder if it would work/be appropriate to import ClassSize and have something like:
        if (ClassSize.getRefSize() > 4)
        systemProperties.setProperty("derby.language.maxMemoryPerTable", "2048");

        That way we would be sure we are using the same caclulations as the engine to determine whether we bump the memory.

        Thoughts?

        Show
        Kathey Marsden added a comment - I wonder if it would work/be appropriate to import ClassSize and have something like: if (ClassSize.getRefSize() > 4) systemProperties.setProperty("derby.language.maxMemoryPerTable", "2048"); That way we would be sure we are using the same caclulations as the engine to determine whether we bump the memory. Thoughts?
        Hide
        A B added a comment -

        > I'm not sure if the new plan is ok or not.

        The test comments preceding the point of failure (line 1139) say:

        // Predicate push-down should occur for next two queries. Thus
        // we we should see Index scans for T3 and T4--and this should be
        // the case regardless of the order of the FROM list.

        So the test case is specifically expecting pushdown to happen; the fact that you're seeing a Table Scan instead of an Index Scan comes from the fact that the optimizer has now determined that it's cheaper to do a hash join than a nested loop join, and predicates don't get pushed for hash joins. Since we don't push the predicate we end up with a table scan.

        Similarly, the test comments preceding the original failure reported in this issue say (line 1373):

        // In this query the optimizer will consider pushing, but
        // will find that it's cheaper to do a hash join and thus
        // will not push. So we see hash join with table scan on T3.

        Apparently running with "-d64" makes the optimizer's estimate change: either the hash join estimate goes up or the nested loop estimate (with predicate pushdown) goes down. Either way we end up pushing the predicate instead of doing a hash join, so we fail.

        In general, the test cases for PredicatePushdown were created based on the behavior/estimates seen at the time of writing, and were written with the assumption that the optimizer would make the same decisions on all machines given a noTimeout value of "true". But that assumption was wrong, esp. in the case of refSize, which varies from machine to machine (esp. 32-bit vs 64 bit).

        So I'm not sure how to go about resolving these issues; it seems quite likely that, as Kathey found out, making hardcoded tweaks here and there to maxMemoryPerTable might fix one problem but lead to another (if not on the same machine then perhaps on a different one).

        I wonder if the test harness could figure out if the machine is 32-bit vs 64-bit and then adjust maxMemoryPerTable accordingly? Not sure if there's a reliable way to make that adjustment, but there might be...?

        Show
        A B added a comment - > I'm not sure if the new plan is ok or not. The test comments preceding the point of failure (line 1139) say: // Predicate push-down should occur for next two queries. Thus // we we should see Index scans for T3 and T4--and this should be // the case regardless of the order of the FROM list. So the test case is specifically expecting pushdown to happen; the fact that you're seeing a Table Scan instead of an Index Scan comes from the fact that the optimizer has now determined that it's cheaper to do a hash join than a nested loop join, and predicates don't get pushed for hash joins. Since we don't push the predicate we end up with a table scan. Similarly, the test comments preceding the original failure reported in this issue say (line 1373): // In this query the optimizer will consider pushing, but // will find that it's cheaper to do a hash join and thus // will not push. So we see hash join with table scan on T3. Apparently running with "-d64" makes the optimizer's estimate change: either the hash join estimate goes up or the nested loop estimate (with predicate pushdown) goes down. Either way we end up pushing the predicate instead of doing a hash join, so we fail. In general, the test cases for PredicatePushdown were created based on the behavior/estimates seen at the time of writing, and were written with the assumption that the optimizer would make the same decisions on all machines given a noTimeout value of "true". But that assumption was wrong, esp. in the case of refSize, which varies from machine to machine (esp. 32-bit vs 64 bit). So I'm not sure how to go about resolving these issues; it seems quite likely that, as Kathey found out, making hardcoded tweaks here and there to maxMemoryPerTable might fix one problem but lead to another (if not on the same machine then perhaps on a different one). I wonder if the test harness could figure out if the machine is 32-bit vs 64-bit and then adjust maxMemoryPerTable accordingly? Not sure if there's a reliable way to make that adjustment, but there might be...?
        Hide
        Kathey Marsden added a comment -

        Well bumping the memory was not as easy as I thought. It causes another failure:
        1) test_predicatePushdown(org.apache.derbyTesting.functionTests.tests.lang.PredicatePushdownTest)junit.framework.Asserti
        onFailedError: Expected index scan on T3
        at org.apache.derbyTesting.functionTests.tests.lang.PredicatePushdownTest.test_predicatePushdown(PredicatePushdo
        wnTest.java:1139)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:104)
        at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22)
        at junit.extensions.TestSetup$1.protect(TestSetup.java:19)
        at junit.extensions.TestSetup.run(TestSetup.java:23)
        at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57)
        at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22)
        at junit.extensions.TestSetup$1.protect(TestSetup.java:19)
        at junit.extensions.TestSetup.run(TestSetup.java:23)

        I'm not sure if the new plan is ok or not.

        Show
        Kathey Marsden added a comment - Well bumping the memory was not as easy as I thought. It causes another failure: 1) test_predicatePushdown(org.apache.derbyTesting.functionTests.tests.lang.PredicatePushdownTest)junit.framework.Asserti onFailedError: Expected index scan on T3 at org.apache.derbyTesting.functionTests.tests.lang.PredicatePushdownTest.test_predicatePushdown(PredicatePushdo wnTest.java:1139) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:104) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22) at junit.extensions.TestSetup$1.protect(TestSetup.java:19) at junit.extensions.TestSetup.run(TestSetup.java:23) at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22) at junit.extensions.TestSetup$1.protect(TestSetup.java:19) at junit.extensions.TestSetup.run(TestSetup.java:23) I'm not sure if the new plan is ok or not.
        Hide
        Kristian Waagan added a comment -

        My wish was to see if we got any errors after the refSize calculation was improved. Obviously we do, and now that they are documented I see two options:
        a) Increase maxMemoryPerTable
        b) Work on the calculation improvements suggested by Knut before we increase maxMemoryPerTable.

        I don't have any strong feelings, so if option a) stops the failures feel free to proceed. It is easy to go back later if needed for investigations.
        I think Mike said the current value is based on old assumptions (i.e. done many years ago), so making it more up to date sounds reasonable in any case.

        Show
        Kristian Waagan added a comment - My wish was to see if we got any errors after the refSize calculation was improved. Obviously we do, and now that they are documented I see two options: a) Increase maxMemoryPerTable b) Work on the calculation improvements suggested by Knut before we increase maxMemoryPerTable. I don't have any strong feelings, so if option a) stops the failures feel free to proceed. It is easy to go back later if needed for investigations. I think Mike said the current value is based on old assumptions (i.e. done many years ago), so making it more up to date sounds reasonable in any case.
        Hide
        Kathey Marsden added a comment -

        I think this will be fixed by changing derby.language.maxMemoryPerTable to 4096, but I know Kristian had some qualms about doing this. I think the actual calculation of refSize has improved since the change and that is what is making the test fail on the 64 bit machines.

        Show
        Kathey Marsden added a comment - I think this will be fixed by changing derby.language.maxMemoryPerTable to 4096, but I know Kristian had some qualms about doing this. I think the actual calculation of refSize has improved since the change and that is what is making the test fail on the 64 bit machines.

          People

          • Assignee:
            Unassigned
            Reporter:
            Ole Solberg
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:

              Development