|
[
Permlink
| « Hide
]
Knut Anders Hatlen added a comment - 18/Dec/07 12:26 PM
Marking the issue as a regression, since it works correctly on Derby 10.1.3.1 and earlier releases.
Regression occurs as of svn # 423754, for
Attaching Also attaching a quick patch to implement the solution mentioned in 1) I have not yet added an appropriate test case to the regression suites. 2) There was one failure in derbyall when I ran with this change (SpaceTable.sql). I still need to investigate the failure to see what might be going wrong. suites.All ran cleanly. Not sure when I'll have time to follow-up, but I will post when I know more... Hi Army, thanks for looking at this issue (I marked it as assigned to you).
Your assessment makes complete sense to me; thanks for writing the issues up so clearly and completely! Two questions occurred to me as I read your writeup: 1) Why is it that TAB_D is dependent on HOJ but TAB_V is not? Both of the exists subqueries look very similar to me; they each seem to be correlated to TAB_B in the HOJ, and isn't that where the dependency arises? It isn't really directly relevant to the bug, I was just hoping you could educate me on why the one subquery was dependent on the B-C-A join but the other wasn't. 2) The way I read your patch, you basically caused steps (3) and (4) to be done as part of step (1b), which makes complete sense to me. But it surprised me a bit that you didn't seem to *move* that code; rather you inserted *new* code into step (1b). That is, it now seems like there are multiple places in OptimizerImpl.getNextPermutation() which all seem to be doing the same thing: - your new code to be inserted at line 589 or so - The hunk of code at line 809 or so, starting with the comment "We are going to try an optimizable at the current join order position" - The hunk of code at line 1093 or so, starting with the comment "Clear the assigned table map" I'm afraid I don't really have a very clear comment to make here; I'm just observing that getNextPermutation() appears to compute the "pullMe" optimizable twice, and in one case it clears the assignedTableMap bits but in the other it doesn't, and your patch adds a 3rd place where we compute the pullMe optimizable (and clear the assignedTableMap bits). Perhaps this question puts my finger most closely on my confusion: Why don't we need to clear the assigned table map bits at line 809? Thanks for the great questions, Bryan.
> 1) Why is it that TAB_D is dependent on HOJ but TAB_V is not? I admit I don't know. The process of flattening EXISTS subqueries into specialized FromBaseTables that are called "existsBaseTables" is one I have not investigated. I know it happens, and I know that an "exist base table" is one of the primary reasons that dependencies exist, at least according to the comment in FromBaseTable.legalJoinOrder(): public boolean legalJoinOrder(JBitSet assignedTableMap) { // Only an issue for EXISTS FBTs if (existsBaseTable) { /* Have all of our dependencies been satisfied? */ return assignedTableMap.contains(dependencyMap); } return true; } As you say, it seems like the two EXISTS subqueries are pretty similar, but I do not know what is it that causes a dependency on TAB_D only. I haven't spent time in that particular piece of the code... > 2) [...] It surprised me a bit that you didn't seem to *move* that > code; rather you inserted *new* code into step (1b). I agree with you on this. I was originally thinking to reorg the code a bit, but then I just wanted to see if I was on track by making the "quick fix" and checking the results. It solved the reported problem so I posted it; but yes, the patch is incomplete and a more formal reorganization of the relevant methods would be a good idea. I don't know how much time I'll have to look at this over the next many days, so I just did a quick "drop" posting of what I found--which is one of the reasons I hadn't officially assigned myself to the issue :) But you bring up a good point. > Why don't we need to clear the assigned table map bits at line 809? I don't think this is necessary because the other pullMe code (line 1093) will eventually (in a round-a-bout way) clean up after the missing "clear assigned table map" from 809--at least in most cases. But adding the appropriate code to line 809 may fix this bug and render the other use unnecessary, as well. If that's true then that would probably be the best final approach. And of course, there's still the SpaceTable.sql issue to investigate... Attaching d3288_v2.patch, which is a first attempt a complete solution for this issue. The patch does the following:
1. Move all of the existing "pull optimizable" processing out of the getNextPermutation() method and into its own method, pullOptimizableFromJoinOrder(), for the sake of clarity. 2. Add a call to pullOptimizableFromJoinOrder() that updates assignedTableMap to correctly reflect the pulling of the optimizable in question. 3. Make the getNextPermutation() metod call the new method (pullOptimizableFromJoinOrder()) **BEFORE** trying to determine what the next legal optimizable for the join order can be. Changes 1 thru 3 above fix the query as reported in this Jira. But with those changes, store/SpaceTable.sql was failing with an ASSERT failure just as it did for d3288_incomplete_v1.patch. I looked into this and eventually tracked it down to the fact that the FromVTI class misuses the "referencedTableMap" field that it inherits from ResultSetNode.java. More specifically, that field is intended to hold table numbers for all FromTables which appear at or _beneath_ a given FromTable in the query tree. But for FromVTI, the referencedTableMap was holding table numbers for all FromTables which were referenced by the _arguments_ to the FromVTI. As an example, take the following query (which is pulled from store/SpaceTable.sql): select conglomeratename, isindex, numallocatedpages, numfreepages, pagesize, estimspacesaving from SYS.SYSSCHEMAS s, SYS.SYSTABLES t, new org.apache.derby.diag.SpaceTable(SCHEMANAME,TABLENAME) v where s.SCHEMAID = t.SCHEMAID and s.SCHEMANAME = 'APP' order by conglomeratename; In this query SpaceTable will be parsed as a FromVTI and will get its own table number; SYSSCHEMAS and SYSTABLES will each get its own table number, as well. That said, the referencedTableMap for the FromVTI should only contain the table number of the FromVTI itself--because there are no other FromTables beneath the FromVTI in the query tree (esp. FromVTI doesn't have any children nodes that are instances of FromTable). But with the current codeline, the arguments for SpaceTable will be processed and their table numbers will be added to FromVTI's referencedTableMap, so that FromVTI's referenced table map will contain: { SYSSCHEMAS, SYSTABLES, SpaceTable } instead of { SpaceTable } This confuses the optimizer's dependency checking, which assumes that a FromTable's referenced map only contains table numbers for itself and any FromTable children. The result of this confusion is a join order that puts SpaceTable _before_ SYSSCHEMAS and SYSTABLES, which means that attempts to generate the FromVTI wll fail because the tables that it references have not yet been generated. Note that FromVTI _does_ have a dependency on SYSSCHEMAS and SYSTABLES as a result of its arguments--that part is correct. And the current code _does_ try to enforce that dependency. But the misuse of referencedTableMap causes the current code to do the wrong thing. This is because the assignedTableMap used by the Optimizer sets/unsets bits based on the referenced table map. So we end up in a situation like this: Join order: { SYSTABLES, SYSSCHEMAS, SpaceTable } assignedTableMap: { SYSTABLES, SYSSCHEMAS, SpaceTable } -- Pull SpaceTable from the join order. We do an XOR between assignedTableMap and SpaceTable's referenced table map. Since SpaceTable's referenced table (incorrectly) includes bits for SYSSCHEMAS and SYSTABLES, the XOR yields: Join order: { SYSTABLES, SYSSCHEMAS, -- } assignedTableMap: { } // this is WRONG! Note how the assignedTableMap is now empty. This is wrong-- after the pull there are still two tables in the join order, so assignedTablemap should reflect those tables. But due to the FromVTI's incorrect referenced map, assignedTableMap no longer matches the join order. -- Pull SYSSCHEMAS from the join order. We do an XOR between assignedTableMap and SYSSCHEMA's referenced table map, yielding: Join order: { SYSTABLES, --, -- } assignedTableMap: { SYSSCHEMAS } Note how assignedTableMap is STILL wrong here. Since we (incorrectly) cleared assignedTableMap above, the XOR with SYSSCHEMAS actually _added_ SYSSCHEMAS to the assigned table map, which is the opposite of what were supposed to do. Since SYSSCHEMAS is no longer in the join order, it should not show up in assignedTableMap. -- Attempt to place SpaceTable in the join order. Since it depends on { SYSSCHEMAS, SYSTABLES } and assignedTableMap does not include both of those, we can't place it. So we'll pull again... -- Pull SYSTABLES from the join order. We do an XOR between assignedTableMap and SYSTABLES's referenced table map, yielding: Join order: { --, --, -- } assignedTableMap: { SYSSCHEMAS, SYSTABLES } So our assignedTableMap is completely wrong at this point. Since we (incorrectly) cleared assignedTableMap when we pulled SpaceTable, our assigned table map now indicates that SYSSCHEMAS and SYSTABLES are both already in the join order, when in truth the join order is EMPTY at this point. -- Attempt to place SpaceTable in the join order. Since it depends on { SYSSCHEMAS, SYSTABLES } and since assignedTableMap incorrectly includes both of those, the optimizer will place SpaceTable into the first position in the join order. We end up with: Join order: { SpaceTable, --, -- } assignedTableMap: { SYSSCHEMAS, SYSTABLES, SpaceTable } This becomes the "cheapest" join order that the optimizer finds, so this is what Derby will try to generate. But since SpaceTable depends on SYSSCHEMAS and SYSTABLES, which will only be generated _after_ SpaceTable (because FromTables are generated in the order in which they appear in the join order), the generation code for SpaceTable will fail. All of that to say that FromVTI should not include table numbers for its arguments in its referencedTableMap. So with that in mind, the d3288_v2.patch also does the following: 4. Fix FromVTI to correctly set up its referencedTableMap and dependency map, without mixing the two together. 5. Remove some assignedTableMap "clear" (XOR) logic that appears to only have been necessary due to the bug in FromVTI--at least as far as I can tell. Removing this logic makes it so that there is now a _single_ place in the code where we "add to" the assigned table map (namely, when placing an optimizable), and there is a _single_ place where we "remove from" the assigned table map (namely, when pulling an optimizable). 6. Add the repro for this issue to the existing lang/subqueryFlattening.sql tests, since that's where EXISTS flattening is currently tested. I ran derbyall and suites.All with an earlier version of this patch and everything ran cleanly. Since then I have added a few ASSERT statements to sanity check the contents of assignedTableMap at certain points during the join order processing. I have not yet run the regression suites with these ASSERTs in place, but plan to do so shortly. In the meantime, if anyone has any comments on the patch as posted, I'd be glad to hear them. Note to reviewers: The patch itself is 929 lines, but the vast majority of that comes from the relocation of "pull optimizable" processing into its own method. I ran the regression tests (derbyall and suites.All) with Sun JVM 1.6.0_04 on Solaris x86 and saw no failures. I have not reviewed the code.
Committed d3288_v2.patch to trunk with svn # 618841:
URL: http://svn.apache.org/viewvc?rev=618841&view=rev Setting Fix In to 10.4 and unchecking the "Patch Available" flag. I ran the "svn merge" command to port this change back to 10.3, but the command failed due to a conflict in FromVTI.java caused by changes in were made in 10.4 (as part of
I manually resolved the conflict and created "d3288_10_3_merge.patch" for the port to 10.3. With this patch applied I ran derbyall and suites.All on Linux with ibm142 and saw no failures. Committed d3288_10_3_merge.patch to 10.3 with svn # 619932:
URL: http://svn.apache.org/viewvc?rev=619932&view=rev Gerald, if you have time to try out the fix, can you do so and report back? Army, I tried out the fix you provided: Works fine. Great job!
I don't know if fixes for wrong results *regressions* really need a release note or not, but I'm attaching one here anyway. I am *not* checking the "Existing Application Impact" box as I don't know what the rules are for regressions. If it is agreed that release notes are good for regression fixes, as well, then someone can check the appropriate box to have the release note picked up by the generator tool. Note: the html file will have to be "scrubbed" in order to play nicely with the release note tool.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||