DERBY-3094 (follow-up)
[ Note: This page should be viewed with fixed-length font ]
This
is a follow-up document to "notes.html" as attached to DERBY-3094, which
is great description of the problem and of the proposed solution
(TwoPassForHavingClauseAlso.diff).
In
short, the TwoPassVisitorWithCommentsAndTests.diff fixes the query in
"notes.html" by delaying the "substitution" phase for
ColumnReferences so that it occurs _after_ the "substitution" phase
for compound expressions. With that
patch applied we end up doing the substitutions in an order which leads to the
correct behavior for the query discussed in "notes.html". So that's good.
But
I think there may still be an issue. As
seen with the following example, ColumnReference substitution isn't the only
way that one GroupBy ResultColumn substitution can interfere with another. More generally speaking, anytime we have
multiple GROUP BY expressions such that:
a.
the
SELECT has at least one AGGREGATE in its RCL; AND
b.
one
GroupBy expression gbRC0 contains another expression gbRC1; AND
c.
gbRC1
appears BEFORE gbRC0 in the GROUP BY list; AND
d.
gbRC0
contains at least one column reference that either 1) does NOT appear in the
group by list, or 2) appears in the group by list but only as part of a compound
expression; AND
e.
gbRC0
appears in the SELECT's result column list
then
the query is going to suffer from DERBY-3094.
Note that the above criteria match the criteria listed in
"notes.html" under the heading "The Problem, abstractly",
with two slight tweaks: first, I added the fact that the SELECT list must have
an AGGREGATE in its RCL; second, I tried to phrase it so that gbRC1 can be any
expression, while "notes.html" considers gbRC1 to be a simple ColumnReference. The latter is the key difference here.
Let
the example query be:
select
a*((a+b)/1.0), count(*) from xx group by a+b, a*((a+b)/1.0);
This
example satisfies the five criteria mentioned above and thus is
problematic. In this case gbRC0 is
"a*((a+b)/1.0)" and gbRC1 is "a+b".
---------
Example
A
---------
Let’s
first go through this using the same discussion format as in
"notes.html"...
When
GroupByNode.init calls addUnAggColumns(), the
situation is as follows:
We
then start into the "for each entry in the GROUP BY list" loop. The
first entry is expression "a+b", so we process that expression:
Now
we process the second entry in the GROUP BY list, "a*((a+b)/1.0)":
At
this point, to quote "notes.html", "the damage has now been
done." When it comes time to execute,
the expression "a+b" will point to the parent RCL at execution time,
instead of to the GroupBy RCL, and thus will fail with an NPE.
---------
Example
B
---------
Just
to get things clear in my head, I went through the same example a second time,
but this time with a slightly different discussion format. May be useful to others, maybe not...
SQL:
select a*((a+b)/1.0), count(*) from xx group by a+b,
a*((a+b)/1.0);
To
begin with, we have the following GroupBy columns (numbers within percent signs
are simply used for ease of reference):
·
Let
gbRC[0] => %0% BinaryRelationalOp (+)
/ \
%1% ColRef("A") %2% ColRef("B")
·
Let
gbRC[1] => %3% BinaryRelationalOp (*)
/ \
%4% ColRef("A") %5% BinaryRelationalOp (/)
/ \
%6% BinaryRelationalOp(+) %7% Constant(1.0)
/ \
%8% ColRef("A") %9% ColRef("B")
And
the following SELECT result columns:
·
Let
parentRC[0] => %10% BinaryRelationalOp (*)
/ \
%11% ColRef("A") %12% BinaryRelationalOp (/)
/ \
%13% BinaryRelationalOp(+) %14% Constant(1.0)
/ \
%15% ColRef("A") %16% ColRef("B")
·
Let
parentRC[1] => %17% AggregateNode ("count(*)")
Note
that gbRC[1] and parentRC[0] are DIFFERENT OBJECTS,
even though they are equivalent trees.
Now
we iterate through the gbRCs and for each one, find any equivalent nodes that
exist in the SELECT node's RCL--that is, in the parentRC array.
1.
Start
with gbRC[0].
In searching the parentRC array we find one object which is equivalent
to gbRC[0], and that's object %13%. So we replace that node with a VCN pointing
to gbRC[0]:
parentRC[0] => %10% BinaryRelationalOp (*)
/ \
%11% ColRef("A") %12% BinaryRelationalOp (/)
/ \
VirtualColumnNode %14% Constant(1.0)
|
%0% BinaryRelationalOp(+)
/ \
%1%
ColRef("A") %2%
ColRef("B")
2.
Now
take gbRC[1].
We search the parentRC array for any objects that are equivalent to the
BinaryRelationOp node identified as %3%.
Due to the fact that a) gbRC[1] and parentRC[0] are different objects,
and b) in step 1, we substituted the column reference %13% with a
VirtualColumnNode, parentRC[0] is **NO LONGER** equivalent to gbRC[1]. Thus we fill not find any nodes that are equivalent
to gbRC[1], so we won't replace anything else. So the reference to "%11% ColRef('A')" continues to point to the SELECT's list
instead of to the GroupByNode's list, and the query will fail with an
execution-time NPE.
----------
Conclusion
----------
All of
that said, though, the TwoPassVisitorWithCommentsAndTests.diff solves the
specific query that was reported (and described in "notes.html"), and
so it makes things better, which is great.
Thus I think it would be fine to commit the patch as it is and either
deal with queries like the one above in a follow-up patch, or file a separate
Jira altogether.