|
A B made changes - 09/Feb/06 09:24 AM
A B made changes - 13/Feb/06 05:45 PM
A B made changes - 14/Feb/06 03:52 AM
I have been reading Army's (A B''s) document, and I have some questions.
How could a predicate be pushable to only one side of a union? Can you provide an example of a predicate that can be pushed only to one side? Why are you only dealing with join predicates? It would also be useful to push simple search arguments (i.e. a column compared to a constant), and this case might be more common than join predicates: create view v as select * from t1 union select * from t2 select * from v where c = 1 I would prefer to see any type of predicate pushed into a union - even those containing complex expressions. This might be hard to implement, though, as I don't know whether the cloning methods are implemented for the entire ValueNode hierarchy. Pasting my reply to Jeff's question here (Jira was down this morning, so I just sent the reply to derby-dev):
Jeff Lichtman (JIRA) wrote: > [ http://issues.apache.org/jira/browse/DERBY-805?page=comments#action_12366598 ] > Jeff Lichtman commented on > ------------------------------------- > > I have been reading Army's (A B''s) document, and I have some questions. Thank you very much for taking the time to read it. I really appreciate it. > How could a predicate be pushable to only one side of a union? Can you provide an example of a predicate that can be pushed only to one side? If we take something along the lines of: select ... from t2, (select * from t1 union values (1,2), (3,4), (5,6)) X1 (a,b) where X1.a = t2.i; In this case the predicate X1.a = t2.i could be pushed to the left ("select * from t1") and used when reading T1, but couldn't be pushed to the VALUES clause because there's no underlying table. If we pushed to the left but not to the right, then removed it from the UnionNode's predList--which is the restrictionList of the ProjectRestrictNode above the UnionNode--the rows from the right would remain unqualified and thus we'd return incorrect results (more rows that intended). > Why are you only dealing with join predicates? It would also be useful to push simple search arguments (i.e. a column compared to a constant), and this case might be more common than join predicates: Actually, when I first made the changes described in the document, I pushed any predicate that was a binary relational operator with a column reference on at least one side and a query-invariant value (i.e. constant or parameter) on whichever side was not a column reference (if either). This covered the case of a column compared to a constant. All of my changes worked, so from a logical/coding perspective we could indeed do just that. However, I then put in the join predicate limitation because it seemed to me (based on very brief inspection) that the case of a comparison with a constant was covered by Satheesh's fix for <Jeff Lichtman> > BTW, the business of pushing and pulling predicates during optimization can be > hard to understand and debug, so maybe it's best to only handle the simple > cases and do it during preprocessing. <Satheesh> The pushing is done before optimization... during preprocessing. [ ... ] You bring up a *great *point about pushing join predicates. I am not implementing this for UnionNode. And of the course, the "summary" of So given that, I figured the goal for /* ** Pull the predicates at from the optimizable and put ** them back in the predicate list. ** ** NOTE: This is a little inefficient because it pulls the ** single-table predicates, which are guaranteed to always ** be pushed to the same optimizable. We could make this ** leave the single-table predicates where they are. */ pullMe.pullOptPredicates(predicateList); So it seemed like pushing more single-sided predicates would be adding to the "inefficiency" mentioned here, and since the predicates are (as I understand it) already handled in preprocessing for > I would prefer to see any type of predicate pushed into a union - even those containing complex expressions. This might be hard to implement, though, as I don't know whether the cloning methods are implemented for the entire ValueNode hierarchy. Sounds like an "itch" to me :) While it might indeed be nice to push predicates containing complex expressions, that's another enhancement of its own. I won't be doing that with my Thanks again--I can't say that enough--for reading the document. It's a huge one and I'm grateful for your time and feedback. Based on the 6 steps that I've outlined in
-- Phase 1 -- The first patch will implement the changes as described in "Step 5" of the document, namely: 5 - Ensure that the best access path for a UnionNode that pushes predicates is correctly saved during optimization and correctly retrieved when it comes time to finalize the query's overall access path. The phase 1 patch shouldn't have any functional effect on the codeline; it's just an incremental step toward the complete fix. -- Phase 2 -- The second patch will add code for "Step 1", which is: 1 - Add the ability to take a predicate and scope it to a target result set so that it can be pushed to that result set. The phase 2 patch will add code to the codeline that won't actually get executed until phase 3. Thus, like the phase 1 patch, the phase 2 patch should have no functional effects on the codeline. -- Phase 3 -- The third patch will be the one that does the core of the predicate pushing/pulling. This patch will cover Steps 2, 3, 4, and 6 of the steps outlined in the document. Once this patch is committed, the changes for phase 1 and phase 2 will take effect--so phase 3 is also where I'll add tests to make sure predicates are being pushed correctly. Note that after phase 3, any join predicates which are targeted for UNIONS will _always_ be pushed down into the union (assuming it's possible to do so). This means that, as discussed in the document, there will be cases where Derby originally chose to do a Hash join using the predicate but will now (after phase 3) push the predicate and do a nested loop join. This problem will then be addressed in phase 4. -- Phase 4 -- The phase 4 patch will address any unresolved issues from the first three phases, which right now means that it will include the necessary changes to allow the optimizer to make a cost-based decision about whether or not it should push the predicates, instead of always pushing them. The vast majority of the changes will go into the code as described in Here is my reply to Army's reply (also previously posted to derby-dev):
> > How could a predicate be pushable to only one side of a union? Can you provide an example of a predicate that can be pushed only to one side? > > > If we take something along the lines of: > > > select ... from > t2, > (select * from t1 union values (1,2), (3,4), (5,6)) X1 (a,b) > where X1.a = t2.i; > > > In this case the predicate X1.a = t2.i could be pushed to the left ("select * from t1") and used when reading T1, but couldn't be pushed to the VALUES clause because there's no underlying table. OK. One way to deal with this is to put a ProjectRestrictNode between the union node and the values clause as a place to park the predicate. To make things simple, you might want to always put ProjectRestrictNodes under both sides of the union during preprocessing (i.e. after binding but before optimization). In some cases the extra nodes won't be needed, but ProjectRestrictNodes (and the corresponding ProjectRestrictResultSets) are cheap. Also, you could eliminate unnecessary ProjectRestrictNodes at the end of optimization (possibly in modifyAccessPaths()). This approach would give better performance in some cases, and could simplify the code (since you wouldn't have to figure out when the predicates are pushable). > OK. One way to deal with this is to put a ProjectRestrictNode between the union node and
> the values clause as a place to park the predicate. To make things simple, you might want > to always put ProjectRestrictNodes under both sides of the union during preprocessing > (i.e. after binding but before optimization). In some cases the extra nodes won't be needed, > but ProjectRestrictNodes (and the corresponding ProjectRestrictResultSets) are cheap. > Also, you could eliminate unnecessary ProjectRestrictNodes at the end of optimization > (possibly in modifyAccessPaths()). Yes, that all sounds about right :) But I wonder if that couldn't be done later as follow-up work? Would you say that the changes I've proposed--namely, see if we can push the predicates to both sides and skip the push if not--are incorrect, or just that we could build on them to make them better? If they're incorrect, then I'll have to look at this more. But my feeling is that this is a(nother) way in which the pushing of join predicates could be made better once we have it working--i.e. something that would work well as a follow-up patch...? Attaching d805_phase1_v1.patch, which is the patch for "Phase 1" as defined in my earlier comment. The changes are mostly just what I described in the html document, except that I've added logic to recurse down a UNION chain where necessary.
I meant to run derbyall with these changes last night, but I messed up my environment, which means the test results were not meaningful. I ran "derbylang" this evening and everything passed, so I'm posting the patch. I will run the full derbyall suite tonight and will post results tomorrow.
A B made changes - 17/Feb/06 02:14 PM
A B made changes - 17/Feb/06 02:16 PM
> But I wonder if that couldn't be done later as follow-up work? Would you say that the changes I've proposed--namely, see if we can push the predicates to both sides and skip the push if not--are incorrect, or just that we could build on them to make them better?
Your changes are not incorrect. I just wonder whether they wouldn't be both simpler and more complete if you didn't have to worry about whether predicates were pushed to both sides. I have the feeling you're anxious to finish this phase of the project. It's OK with me if you check in this change with the current algorithm. If you do this, please add comments to the code saying what could be changed to make it possible to apply a pushed predicate to one side only. I ran derbyall last night (Windows 2000, Sun 1.4.2) with d805_phase1_v1.patch applied to my codeline and saw no new failures (only failure was xaSimplePositive.sql, which is not related to my changes).
That said, I should point out that these phase 1 changes may need to be enhanced/updated at some point in the near future, depending on the results of the thread I started on derby-dev: http://article.gmane.org/gmane.comp.apache.db.derby.devel/14836 The link between that thread and d805_phase1_v1.patch is that the phase 1 patch expects "trulyTheBestAccessPath" for an Optimizable to be set correctly, but the issue described in the given thread may lead to situations where trulyTheBestAccessPath doesn't hold the value that it should. Note, though, the d805_phase1_v1.patch changes alone do not not affect the functionality of the codeline, so I think the patch can still be committed. It's only when we get to Phase 3 that the issue described on derby-dev could start influencing the behavior of the codeline with my changes for predicate pushdown.
A B made changes - 18/Feb/06 04:00 AM
<Jeff>
> It's OK with me if you check in this change with the current algorithm. If you do this, > please add comments to the code saying what could be changed to make it possible > to apply a pushed predicate to one side only. Okay, will do. These comments will apply to the Phase 2 patch. Thanks for describing how the changes could work--that'll certain be a big help when it comes time to implement them. Note to reviewers/committers: I plan to post a more robust version of the Phase 1 patch sometime soon, so it might be good to hold off on reviewing/committing d805_phase1_v1.patch until the new version can be posted and reviewed. If it's agreed that the newer version is better (when I post it), then that version should be the one we commit.
Attaching a "version 2" of the Phase 1 patch, which does things in a more general and more robust way, based on discussed here:
http://article.gmane.org/gmane.comp.apache.db.derby.devel/14934 The new patch is "d805_phase1_v2.patch". I'm also attaching Note that even though d805_phase1_v2.patch is a different approach, it is still the case none of the existing Derby functionality should be affected by this change. The new code is exercised, but until we actually start pushing predicates, there should be no behavioral differences. I ran derbylang with this patch and IBM 1.4.2 with no failures. I haven't had time to run derbyall yet, but hope to do so tonight (my poor machine is working around the clock lately, so night-time is the only time I can run the full suite). Thanks in advance for review comments.
A B made changes - 19/Feb/06 12:44 PM
I ran derbyall on Windows 2000 using IBM JDK 1.4.2 with d805_phase1_v2.patch applied and saw no new failures--only xaSimplePositive and jdbcapi.SURTest.junit failed, neither of which is caused by these changes. So I'm marking this as "Patch available" for review/commit. Thanks.
A B made changes - 20/Feb/06 08:03 AM
While working with other changes for this issue, I came to realize that d805_phase1_v2 has one small problem. That patch assumes that an OptimizerImpl will always find a "best join order" before it attempts to "pull" any Optimizables and re-position them for another join order. But with the "JUMPING" functionality that the Optimizer does for queries with a large number of tables, it turns out that it is in fact possible to "pull" an Optimizable before finding a best join order. So I need to update the Phase 1 patch to account for this.
I already have the required fixes locally; I want to run derbylang to make sure nothing breaks, and then I will post another version of the patch. In the meantime, I'm unchecking the "patch available" box...
A B made changes - 22/Feb/06 08:14 AM
I've finalized changes for d805_phase1_v3.patch but still need to run derbyall tonight to make sure there are no new diffs (I meant to do it last night but ended up tracing through code well into the night instead).
In the meantime, I'm posting the first version of the Phase 2 patch here: d805_phase2_v1.patch. This patch handles the "scoping" logic for predicates as described in "Step 1" of The Phase 2 patch (d805_phase2_v1.patch) is independent of the Phase 1 patch and therefore can be committed before or after Phase 1--the order in which the two are committed doesn't actually matter so long as both are committed before Phase 3. The Phase 2 patch does the following: 1. Adds the necessary logic/methods to scope a predicate 2. Adds some utility methods and one new class (BaseTableNumbersVisitor) used for scoping predicates 3. Adds some other simple utility methods in preparation for predicate pushdown with Phase 3. None of the new methods will ever get called until Phase 3 is committed, so d805_phase2_v1.patch should have no affect on the codeline--it's just part of incremental development. Thanks to anyone who has a chance to review and/or commit...
A B made changes - 23/Feb/06 06:03 AM
A B made changes - 23/Feb/06 06:09 AM
Attaching an updated patch d805_phase1_v3.patch to address the issue in my last comment. The differences between d805_phase1_v3.patch and d805_phase1_v2.patch are summarized here:
1) There was a slight off-by-one error in the optimizer JUMP logic that was causing it to pull Optimizables prematurely. This error is what led me to realize that phase1_v2 was incomplete--however, the JUMP error _is_ still an error, so phas1_v3 fixes it (one-line change). 2) phase1_v3 adds logic to only reload best plans when required, instead of doing it every single time we pull (which is what phase1_v2 did). Since plan "reload" can be relatively expensive--especially for deeply-nested subqueries--it's better to only do it when it's required. 3) phase1_v3 adds logic to remember nested OptimizerImpl "best join orders" in addition to nested Optimizable "best access paths". The join orders have to saved with respect to outer queries, just like the access paths, but phase1_v2 did not do that. phase1_v3 does. I ran derbyall on Red Hat Linux with IBM 1.4 and saw one diff in predicatesIntoViews that matches the diff from If anyone has time to review, I'd be grateful... **NOTE to commiters: There are now two patches pending for this issue: d805_phase1_v3.patch and d805_phase2_v1.patch. The patches are independent of each other and can be committed in either order.
A B made changes - 23/Feb/06 11:23 AM
I am starting to review these patches, with the aim of submitting them to trunk. Let me know if any one has any comments on the patches so far.
Thanks Army for all the good work!
Attaching a "Phase 3" patch, d805_phase3_v1.patch, to implement the changes described in steps 2, 3, 4 and 6 of
1. Implements the changes as described in the HTML document attached to this issue (steps 2, 3, 4 and 6). 2. Adds a check in PredicateList.java to skip scoped predicates when trying to find a predicate to use for a hash join. This check is needed because if the predicate was scoped and pushed down from an outer query, then one of the operands points to the outer query and the other (scoped) operand points down into a subquery, which means we can't do the hash join. Without this check the optimizer might choose to do a hash scan using the predicate even though it won't actually be able to do the corresponding hash join. That means we would lose the opportunity to use the predicate for an index scan, and thus we could miss out on a much better (index-based) plan. 3. Makes a slight change to OptimizerImpl.addOrLoadBestPlanMappings to avoid doing extra work where possible. More specifically, skips the logic to save/restore bestJoinOrder if there is only one Optimizable in the list, as there's only one possible join order in that case and so we don't need to keep track of which join order is best for each of the outer queries. 4. Adds a new test, lang/predicatePushdown.sql, to the harness as part of the derbylang suite. This test runs a bunch of queries that relate to the pushing of predicates down into unions. Most of these test cases failed (either with compilation/execution errors or incorrect results) at one point or another while I worked on this issue, so I've chosen to include them all as part of the new test. The only way (that I know of) to tell if predicate pushdown is actually occuring is to print out the query plans, so the master file for this test is very large--almost 10,000 lines. For that reason, combined with the fact that I think additional predicate pushdown tests will be needed as support for pushdown grows, I decided to create a new test instead of adding the test cases to an existing test. Once the Phase 1, Phase 2, and Phase 3 patches have been applied, along with the patch for A - As described in B - The changes I've made for this issue--esp. for Phase 1--add a good amount of code to the optimization code path. This means that it's now possible for the optimizer to timeout "sooner" than it would have prior to these changse. By "sooner" I mean that the optimizer might not have a chance to look at as many join orders as it used to (because it takes longer for it to cost each join order, esp. with deeply nested subqueries). This means that there are situations where the optimizer pre- As originally stated when I broke this enhancement down into phases, the issues that remain after Phase 3 will be addressed in Phase 4 (and whatever follow-up phases are required thereafter). In the meantime, the surest way to get the full effect of predicate pushdown is to disable optimizer timeout, which can be done with the property derby.optimizer.noTimeout=true. Note that I set this property for the new predicatePushdown test, to ensure that the optimizer will choose the same plans across all machines. I ran derbyall on Red Hat Linux with IBM 1.4.2 and saw no new failures. ** NOTE ** though that this patch, d805_phase3_v1.patch, has a dependency on d805_phase2_v1.patch and also on the patch for I know the patch looks long, but keep in mind that a good 9,000 lines of the diff is just for the new test. I thought about posting the new test separately before the Phase 3 patch and then submitting an updated master file for Phase 3, but then the diff there would be even larger as most of the query plans would change. So I've just included the new test as part of the Phase 3 patch. So in short, the current patches for review-and-commit are as follow: 1. d805_phase1_v3.patch (already committed) 2. d805_phase2_v1.patch 3. d1007_v1.patch (for 4. d805_phase3_v1.patch (dependent on all of the previous patches). If anyone has the time to review/comment on any of these patches, I would most certainly appreciate it...
A B made changes - 02/Mar/06 07:11 AM
A B made changes - 03/Mar/06 06:33 AM
Small update to the lang/predicatePushdown.sql test to avoid truncation of query plans. This should solve the diff reported in the TinderBox results here:
http://www.multinet.no/~solberg/public/Apache/TinderBox_Derby/Limited/testSummary-382964.html This is a small change to a test only--no code changes--so if any commiter out there could commit, that'd be great.
A B made changes - 04/Mar/06 11:18 AM
Committed predPushdown_testFix.patch with revision 383062.
Andrew McIntyre made changes - 01/Apr/06 06:09 AM
Attaching two things to this issue:
1 - 2 - d805_phase4_v1.patch: a patch that implements the Phase 4 changes described in Section VII of the _v4.html document. The phase4_v1 patch also fixes a small error in BaseTableNumberVisitor.java and updates the lang/predicatePushdown.sql test to account for the new Phase 4 behavior. As part of the changes for predicatePushdown.sql I have added a lot more data to the tables, which means the test queries take longer to run. So where this test used to run in under a minute on my machine, it now takes over 4 minutes to complete. That's expected given my changes, but I thought I should state that explicitly in case anyone notices. I ran derbyall on Linux Redhat using sane jars with IBM 1.4.2 and saw no new failures. I also ran the updated lang/predicatePushdown.sql test against IBM wsdd5.6 and Sun 1.5 and the test passed without problems. If anyone has time to review and commit, that would be great. Note that while the patch is around 5500 lines, only about 600 lines of it are actual code changes; the rest is the result of the updated predicatePushdown.sql test and the corresponding output (a lot of query plans are printed in this test, so the output is quite large).
A B made changes - 08/Apr/06 08:23 AM
[ Attaching d805_phase4_v2.patch, which only has a 1-line diff from the v1 patch ]
After stepping away from the code for the weekend and then re-reading the document for Phase 4, I noticed a small (one-line) error in the first version of my Phase 4 changes. In the document I correctly wrote the following regarding the need to revert access paths for subtrees when an Optimizable's current permutation is found to be "not the best": <begin_quote> In order to fix this behavior, we have to add logic to save the "best plans" for an Optimizable's subtree before each new permutation, and then if the current permutation isn't the best one, we need to revert the entire subtree's plans back to what they were for the "best" permutation. <end_quote> But the code that I wrote didn't quite match this statement. In the following diff: + // If the final path that we considered for curOpt was _not_ the best + // path for this round, then we need to revert back to whatever the + // best plan for curOpt was this round. Note that the cost estimate + // for bestAccessPath could be null here if the last path that we + // checked was the only one possible for this round. + if (!retval && + (curOpt.getBestAccessPath().getCostEstimate() != null) && + (curOpt.getCurrentAccessPath().getCostEstimate() != null)) + { the check for "!retval" means that we will only revert the plans if we have exhausted all permutations for the current Optimizable. But as I wrote in the document, we need to check (and potentially revert) the subtree's plans after _each_ new permutation, not just after the final one. To show why this matters, assume we have some Optimizable with three possible permutations P1, P2, and P3, of which P1 is the "best". With the code as I wrote it in d805_phase4_v1.patch, we would end up doing the following: -- Pick permutation P1. -- Since P1 is the first permutation, there is no previous "best" path so there's nothing to save. -- Optimize P1 and get the cost; say it's 25. -- Since P1 isn't our final permutation, we don't revert. So the plans corresponding to P1 are still saved at the Optimizable and its subtree. -- Pick permutation P2. -- Before optimizing P2, save the best paths for the current Optimizable's subtree; this means we'll save off the paths corresponding to P1. -- Optimize P2 and get the cost; say it's 50. -- Since P2 isn't our final permutation, we *don't* revert. So the plans corresponding to P2 are still saved at the Optimizable and its subtree. -- Pick permutation P3. -- Before optimizing P3, save the best paths for the current Optimizable's subtree; this means we'll save off the paths corresponding to *P2*. (This is wrong.) -- Optimize P3 and get the cost; say it's 75. -- Since P3 is our final permutation, and since it is NOT the best permutation that we found, we'll now revert the paths of this Optimizable's subtree. That mean's we'll load the plans that we most recently saved and generate the query plan based on those. But that will give us the plans for _P2_, when what we wanted was the plans for _P1_. So d805_phase4_v2.patch fixes this by removing the check for "!retval" in the above diff. Then after we optimize P2 and get the cost, we'll see that it's not the best so we'll immediately revert the paths back to what they were for P1. The P1 paths (instead of the P2 paths) will then get saved off before optimization of P3 starts, and thus when we do the "revert" after P3, we'll load the correct paths (P1). As it turns out this particular change doesn't change any of the other Phase 4 behavior. The reason is that an Optimizable that is not a FromBaseTable only (currently) has two permutations--nested loop join and hash join--and thus the above scenario can't currently happen. FromBaseTable's can have more than two permutations, but since they don't have subtrees beneath them this whole save/revert logic is not required. Thus even though the v1 patch was technically incorrect, everything ran as expected. For accuracy, though, I think this small error should be fixed, so that's what the second version of the phase 4 patch (d805_phase4_v2.patch) does. I've also added a _v5.html document that is almost identical to _v4 except for two things: 1) the small change that I just described, and 2) I've included a note with the info about index statistics that Andrew posted to derby-dev (thanks to Bryan for asking the question and to Andrew for providing the answer). Just to be safe I re-ran derbyall on Red Hat Linux with ibm142 sane jars and saw no failures.
A B made changes - 11/Apr/06 05:24 AM
All four patches for each phase have been committed to trunk. Thanks Army.
Attaching a follow-up patch, d805_followup_v1.patch, that addresses some issues which remained after Phase 4 was committed. In particular:
1) Added logic to skip predicate pushdown when either of the predicate's column references does not point to a base table. This can happen if, for example, the column reference points to a literal or an aggregate expression. Further work is required for such situations in order to correctly "remap" the column reference to its source (or at least, to figure out what exactly it means to remap a ColumnReference that doesn't point to a base table, and then to implement the appropriate changes)--so in the meantime, I've just decided to skip pushing the predicate for now. 2) Added logic to correctly set the column number of a "scoped" reference based on whether or not the reference points to a base table. Existing comments in the relevant sections of code describe why we need to set the column numbers for references pointing to base tables, but the code itself didn't actually check for the base table condition--it set the column number for all scoped references, which wasn't always correct. 3) In cases where a ColumnReference's source ResultColumn's expression is not another ColumnReference, made it so that the scope operation will return a clone of ColumnReference (instead of the ColumnReference itself) since that ColumnReference will be pushed to two result sets. 4) Added corresponding test cases to the lang/predicatePushdown.sql test and updated the master file accordingly. I ran derbyall on Red Hat Linux with ibm142 and saw no new failures. If anyone has time to review, I'd be grateful. Thanks.
A B made changes - 25/Apr/06 11:08 AM
Testing effect of
various complex queries were created and run against 10.1 and 10.2 versions and the results and the test descriptions are being attached to this issue. The overall result after running these tests looks promising and 10.2 performs way better than 10.1. Still it will be good to have an improvement to the existing execution model to use multiple indexes for table scan. I have filed seperate jira entry for that.
Manjula Kutty made changes - 27/Apr/06 01:27 PM
Attaching patch to port all phases of this issue plus changes for
http://article.gmane.org/gmane.comp.apache.db.derby.devel/19330 Attaching here for ease of download and tracking.
A B made changes - 28/Apr/06 05:54 AM
I have committed this patch to 10.1 branch. Thanks for pulling all relavent checkins.
Fix has been merged into 10.2 (trunk), and 10.1 branches.
Satheesh Bandaram made changes - 02/May/06 06:35 AM
Satheesh Bandaram made changes - 02/May/06 06:35 AM
Andrew McIntyre made changes - 17/Aug/06 02:06 AM
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 - Add the ability to take a predicate and scope it to a target result set so that it can be pushed to that result set.
2 - Implement the "pushOptPredicate()" and "optimizeIt()" methods for UnionNodes. The former method should take predicates that are passed into the UnionNode from outer queries, scope them (per step 1) for the left and right children of the UnionNode, and store them locally. The latter method should then pass the scoped predicates down to both children so that they can use the predicates in their own optimize()/optimizeIt() calls.
3 - Take scoped predicates (created in step 1) that are pushed into the children result sets of a UnionNode (per step 2) and allow the the children to push the scoped predicates even further down the tree, until we eventually get them to a base table.
4 - Make sure predicates that are pushed down into subqueries of a UnionNode are correctly "pulled" back up (if they are unscoped) or discarded (if they are scoped) for every permutation seen during optimization.
5 - Ensure that the best access path for a UnionNode that pushes predicates is correctly saved during optimization and correctly retrieved when it comes time to finalize the query's overall access path.
6 - And finally, when optimization is complete, make sure all relevant predicates are pushed down the tree one last time and left there, in preparation for code generation.
See
DERBY-805.html for all the gory details.I have made the changes described in this document locally and they all seem to work, with a couple of exceptions as noted at the end of the document. I plan to break the changes down into separate patches where it's possible to do so, and will be posting those patches in the coming days. In the meantime, if anyone has time to review this document and provide feedback/direction/suggestion, I would be grateful. As I myself am still trying to learn all the subtleties of Derby optimization, the more feedback I get, the better...