DERBY-3094 (follow-up)

 

[ Note: This page should be viewed with fixed-length font ]

 

This is a follow-up document to "notes.html" as attached to DERBY-3094, which is great description of the problem and of the proposed solution (TwoPassForHavingClauseAlso.diff).

 

In short, the TwoPassVisitorWithCommentsAndTests.diff fixes the query in "notes.html" by delaying the "substitution" phase for ColumnReferences so that it occurs _after_ the "substitution" phase for compound expressions.  With that patch applied we end up doing the substitutions in an order which leads to the correct behavior for the query discussed in "notes.html".  So that's good.

 

But I think there may still be an issue.  As seen with the following example, ColumnReference substitution isn't the only way that one GroupBy ResultColumn substitution can interfere with another.  More generally speaking, anytime we have multiple GROUP BY expressions such that:

 

a.     the SELECT has at least one AGGREGATE in its RCL; AND

b.     one GroupBy expression gbRC0 contains another expression gbRC1; AND

c.     gbRC1 appears BEFORE gbRC0 in the GROUP BY list; AND

d.     gbRC0 contains at least one column reference that either 1) does NOT appear in the group by list, or 2) appears in the group by list but only as part of a compound expression; AND

e.     gbRC0 appears in the SELECT's result column list

 

then the query is going to suffer from DERBY-3094.  Note that the above criteria match the criteria listed in "notes.html" under the heading "The Problem, abstractly", with two slight tweaks: first, I added the fact that the SELECT list must have an AGGREGATE in its RCL; second, I tried to phrase it so that gbRC1 can be any expression, while "notes.html" considers gbRC1 to be a simple ColumnReference.  The latter is the key difference here.

 

Let the example query be:

 

  select a*((a+b)/1.0), count(*) from xx group by a+b, a*((a+b)/1.0);

 

This example satisfies the five criteria mentioned above and thus is problematic.  In this case gbRC0 is "a*((a+b)/1.0)" and gbRC1 is "a+b".

 

---------

Example A

---------

 

Let’s first go through this using the same discussion format as in "notes.html"...

 

When GroupByNode.init calls addUnAggColumns(), the situation is as follows:

We then start into the "for each entry in the GROUP BY list" loop. The first entry is expression "a+b", so we process that expression:

Now we process the second entry in the GROUP BY list, "a*((a+b)/1.0)":

At this point, to quote "notes.html", "the damage has now been done."  When it comes time to execute, the expression "a+b" will point to the parent RCL at execution time, instead of to the GroupBy RCL, and thus will fail with an NPE.

 

---------

Example B

---------

 

Just to get things clear in my head, I went through the same example a second time, but this time with a slightly different discussion format.  May be useful to others, maybe not...

 

SQL: select a*((a+b)/1.0), count(*) from xx group by a+b, a*((a+b)/1.0);

 

To begin with, we have the following GroupBy columns (numbers within percent signs are simply used for ease of reference):

 

·          Let gbRC[0] => %0% BinaryRelationalOp (+)
                  /         \
      %1% ColRef("A")   %2% ColRef("B")

·          Let gbRC[1] => %3% BinaryRelationalOp (*)
                 /         \
   %4% ColRef("A")   %5% BinaryRelationalOp (/)
                           /         \
     %6% BinaryRelationalOp(+)    %7% Constant(1.0)
             /         \
  %8% ColRef("A")   %9% ColRef("B")

 

And the following SELECT result columns:

 

·          Let parentRC[0] => %10% BinaryRelationalOp (*)
                     /         \
       %11% ColRef("A")  %12% BinaryRelationalOp (/)
                               /         \
          %13% BinaryRelationalOp(+)  %14% Constant(1.0)
                  /         \
      %15% ColRef("A")   %16% ColRef("B")

·          Let parentRC[1] => %17% AggregateNode ("count(*)")

 

Note that gbRC[1] and parentRC[0] are DIFFERENT OBJECTS, even though they are equivalent trees.

 

Now we iterate through the gbRCs and for each one, find any equivalent nodes that exist in the SELECT node's RCL--that is, in the parentRC array.

 

1.     Start with gbRC[0].  In searching the parentRC array we find one object which is equivalent to gbRC[0], and that's object %13%.  So we replace that node with a VCN pointing to gbRC[0]:

parentRC[0] => %10% BinaryRelationalOp (*)
                     /         \
       %11% ColRef("A")  %12% BinaryRelationalOp (/)
                               /         \
                
VirtualColumnNode  %14% Constant(1.0)
                        |
             
%0% BinaryRelationalOp(+)
                   /         \
       
%1% ColRef("A")   %2% ColRef("B")

2.     Now take gbRC[1].  We search the parentRC array for any objects that are equivalent to the BinaryRelationOp node identified as %3%.  Due to the fact that a) gbRC[1] and parentRC[0] are different objects, and b) in step 1, we substituted the column reference %13% with a VirtualColumnNode, parentRC[0] is **NO LONGER** equivalent to gbRC[1].  Thus we fill not find any nodes that are equivalent to gbRC[1], so we won't replace anything else.  So the reference to "%11% ColRef('A')" continues to point to the SELECT's list instead of to the GroupByNode's list, and the query will fail with an execution-time NPE.

 

----------

Conclusion

----------

 

All of that said, though, the TwoPassVisitorWithCommentsAndTests.diff solves the specific query that was reported (and described in "notes.html"), and so it makes things better, which is great.  Thus I think it would be fine to commit the patch as it is and either deal with queries like the one above in a follow-up patch, or file a separate Jira altogether.

 

02/22/2008