|
I reproduced this in my sandbox even with the ready-to-commit patch for
Good news is it throws the NPE with both sane and insane builds, and here's the callstack with line numbers --- 2007-11-20 11:43:05.545 GMT Thread[main,5,main] (XID = 173), (SESSIONID = 0), (DATABASE = d3094), (DRDAID = null), Failed Statement is: select a, a*(b/100.000000), count(*) from xx group by a, a*(b/100.000000) ERROR 38000: The exception 'java.lang.NullPointerException' was thrown while evaluating an expression. at org.apache.derby.iapi.error.StandardException.newException(StandardException.java:294) at org.apache.derby.iapi.error.StandardException.unexpectedUserException(StandardException.java:554) at org.apache.derby.impl.services.reflect.DirectCall.invoke(ReflectGeneratedClass.java:164) at org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.doProjection(ProjectRestrictResultSet.java:488) at org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.getNextRowCore(ProjectRestrictResultSet.java:291) at org.apache.derby.impl.sql.execute.BasicNoPutResultSetImpl.getNextRow(BasicNoPutResultSetImpl.java:463) at org.apache.derby.impl.jdbc.EmbedResultSet.movePosition(EmbedResultSet.java:424) at org.apache.derby.impl.jdbc.EmbedResultSet.next(EmbedResultSet.java:368) at org.apache.derby.tools.JDBCDisplayUtil.indent_DisplayResults(JDBCDisplayUtil.java:382) at org.apache.derby.tools.JDBCDisplayUtil.indent_DisplayResults(JDBCDisplayUtil.java:338) at org.apache.derby.tools.JDBCDisplayUtil.indent_DisplayResults(JDBCDisplayUtil.java:241) at org.apache.derby.tools.JDBCDisplayUtil.DisplayResults(JDBCDisplayUtil.java:229) at org.apache.derby.impl.tools.ij.utilMain.displayResult(utilMain.java:435) at org.apache.derby.impl.tools.ij.utilMain.doCatch(utilMain.java:509) at org.apache.derby.impl.tools.ij.utilMain.runScriptGuts(utilMain.java:350) at org.apache.derby.impl.tools.ij.utilMain.go(utilMain.java:248) at org.apache.derby.impl.tools.ij.Main.go(Main.java:215) at org.apache.derby.impl.tools.ij.Main.mainCore(Main.java:181) at org.apache.derby.impl.tools.ij.Main.main(Main.java:73) at org.apache.derby.tools.ij.main(ij.java:59) Caused by: java.lang.NullPointerException at org.apache.derby.exe.ac601a400fx0116x5cddx8367x000000107e400.e3(Unknown Source) at org.apache.derby.impl.services.reflect.DirectCall.invoke(ReflectGeneratedClass.java:145) ... 17 more ============= begin nested exception, level (1) =========== java.lang.NullPointerException at org.apache.derby.exe.ac601a400fx0116x5cddx8367x000000107e400.e3(Unknown Source) at org.apache.derby.impl.services.reflect.DirectCall.invoke(ReflectGeneratedClass.java:145) at org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.doProjection(ProjectRestrictResultSet.java:488) at org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.getNextRowCore(ProjectRestrictResultSet.java:291) at org.apache.derby.impl.sql.execute.BasicNoPutResultSetImpl.getNextRow(BasicNoPutResultSetImpl.java:463) at org.apache.derby.impl.jdbc.EmbedResultSet.movePosition(EmbedResultSet.java:424) at org.apache.derby.impl.jdbc.EmbedResultSet.next(EmbedResultSet.java:368) at org.apache.derby.tools.JDBCDisplayUtil.indent_DisplayResults(JDBCDisplayUtil.java:382) at org.apache.derby.tools.JDBCDisplayUtil.indent_DisplayResults(JDBCDisplayUtil.java:338) at org.apache.derby.tools.JDBCDisplayUtil.indent_DisplayResults(JDBCDisplayUtil.java:241) at org.apache.derby.tools.JDBCDisplayUtil.DisplayResults(JDBCDisplayUtil.java:229) at org.apache.derby.impl.tools.ij.utilMain.displayResult(utilMain.java:435) at org.apache.derby.impl.tools.ij.utilMain.doCatch(utilMain.java:509) at org.apache.derby.impl.tools.ij.utilMain.runScriptGuts(utilMain.java:350) at org.apache.derby.impl.tools.ij.utilMain.go(utilMain.java:248) at org.apache.derby.impl.tools.ij.Main.go(Main.java:215) at org.apache.derby.impl.tools.ij.Main.mainCore(Main.java:181) at org.apache.derby.impl.tools.ij.Main.main(Main.java:73) at org.apache.derby.tools.ij.main(ij.java:59) -- ProjectRestrictResultSet seems to consider the obvious case where projection == null, but it still fails // Use reflection to do as much of projection as required if (projection != null) { result = (ExecRow) projection.invoke(activation); <== ln 488 } else { result = mappedResultRow; } Dropping the count(*) in the debugger actually works:
ij> select a, a*(b/100.000000) from xx group by a, a*(b/100.000000); A |2 --------------------------------------------- 2.0 |0.06 1 row selected I traced through the ProjectRestrictResultSet code and its source GroupedAggregateResultSet, but could not immediately see where the NPE originated. I wasn't really going to hunt this one down, but tracing a little further I see BaseActivation.getColumnFromRow() returns null @ 1359 with a comment that "this happens".
if( row[rsNumber] == null) { /* This actually happens. NoPutResultSetImpl.clearOrderableCache attempts to prefetch invariant values * into a cache. This fails in some deeply nested joins. See Beetle 4736 and 4880. */ return null; <==ln 1359 } This in turn makes BaseActivation.getDataValueFactory() return null, and in the end DataValueFactoryImpl.getDecimalDataValue(NumberValue) returning null. And the NPE surfaces. Hi Thomas, thanks for investigating this. Your analysis is very interesting. You might
want to have a look at you are studying. The patch for
Not surprisingly we run into another NPE in BaseActivation on the return line following the original if() after patching. return row[rsNumber].getColumn(colId); as row[rsNumber] still is null in our case. It would initially seem both If the patch for It also seems that this bug only manifests itself if there is an aggregation function *and* a decimal operation.
No aggregation works: ij> select a, a*(b/100.000000) from xx group by a, a*(b/100.000000); A |2 --------------------------------------------- 2.0 |0.06 1 row selected No decimal operation works: ij> select a, b, count(*) from xx group by a, b; A |B |3 --------------------------------------------------------- 2.0 |3.0 |1 1 row selected Seemingly independent of type of decimal operation: ij> select a, a/b from xx group by a, a/b; A |2 --------------------------------------------- 2.0 |0.6666666666666666 1 row selected But add the aggregate *and* the decimal operation, and we see the NPE. ij> select a, a*b, count(*) from xx group by a, a*b; A |2 |3 --------------------------------------------------------- ERROR 38000: The exception 'java.lang.NullPointerException' was thrown while evaluating an expression. ERROR XJ001: Java exception: ': java.lang.NullPointerException'. ij> Hi Thomas, thanks for studying this problem. I think you should mark the issue as assigned
to you to indicate that you are actively studying it. > row[rsNumber] still is null in our case Sorry, Thomas, I think I wasn't very clear in my other comment. it's related only in the sense that it discusses the getColumnFromRow() code. My belief is that the "if" statement in getColumnFromRow is unnecessary and incorrect, because there should be no situations in which Derby ever needs to fetch a column's value from a row prior to the initialization/population of that row with valid values. The various cases that I looked at before filing which, due to other problems, Derby was accessing the internal row/column data structures in an inconsistent manner. For example, with this particular problem, I think that you could investigate how the ResultColumn instances for the aggregate function and the arithmetic expression column are built up, and how their row and column number values are manipulated during bind processing. Here are some things to pay particular attention to: - when the invalid call to getColumnFromRow is made, which actual table and column is being dereferenced? - what does the generated "e3" method for your case look like? You can view this code using the techniques described at http://wiki.apache.org/db-derby/DumpClassFile Generated method "e3" is the "projection" method for the generated ProjectRestrictResultSet.
Here is a snippet of the generated code: private ResultSet fillResultSet() throws StandardException { getLanguageConnectionContext().getAuthorizer().authorize(this, 1); return getResultSetFactory().getScrollInsensitiveResultSet( getResultSetFactory().getProjectRestrictResultSet( getResultSetFactory().getGroupedAggregateResultSet( getResultSetFactory().getProjectRestrictResultSet( getResultSetFactory().getBulkTableScanResultSet( this, 1024, 5, getMethod("e0"), 2, null, -1, null, -1, true, e1, "XX", null, null, false, false, -1, -1, 7, false, 0, 16, false, 6D, 100.404D), null, getMethod("e1"), 3, null, 4, false, true, 6D, 100.404D), false, 3, 2, getMethod("e2"), 224, 1, 6D, 100.404D), null, getMethod("e3"), 4, null, 1, false, true, 1D, 100.404D), this, 0, 3, getScrollable(), 1D, 100.404D); } Note that getMethod("e3") is the 3rd argument to getProjectRestrictResultset() Note that there are *two* ProjectRestrictResultSet instances in the tree:
- the inner PRN operates on the bulk table scan, and projects columns "a" and "b" from table "xx" out of the table scan using projection "e1". - then the GroupedAggregateResultSet processes the rows, and computes the COUNT(*) into the 3rd column of the template row - then the outer PRN takes the grouped results and projects columns "a", "(expression)", and "COUNT(*)" out of the grouped result row. At the instant of the crash, the outer PRN is performing that final projection, and it is looking for a row from result set 2. But this is wrong; it ought to be looking for a row from result set 1. In the debugger, I can see that result set 1's current row is correct: it has the right columns, with the right values. I can also see that, *later* in projection method "e3", there is another call to getColumnFromRow() which passes result set number 1. So that seems like additional evidence that the core of the problem is that projection method "e3" generated invalid code which is trying to access result set 2 when it should have been accessing result set 1. Since the column in question is column 2, which is the expression a*(b/100.000000), I'm wondering whether the inner PRN and the outer PRN are somehow accidentally sharing the expression information for this result column expression, when one of them should have renumbered the result set for the expression to point to the different result set. This feels really familiar, like we just worked on some problems with result set number handling and result column expressions recently; I'll spend a bit of time searching JIRA to see if I can refresh that memory. At generate time, as the code analyzes the expression a*(b/100.000000),
for the outer PRN, the sourceColumn's resultSetNumber for the ColumnReference "a" is '1', while for "b" it is '2'. So the overall expression is inconsistent; some of the column reference point to the correct result set, others do not. This makes me wonder whether there is some code which traverses ResultColumn expression trees, updating the resultSetNumber, that mistakenly visits the leftOperand of the expression tree but does not visit the rightOperand. "a" is seen while visiting the leftOperand, but to visit "b" one must visit the rightOperand. I thought this was a theory worth recording. As Thomas noted, it's interesting to see the differences between what provokes
this problem, and what does not. For example, this query works fine: select a*(b/1.0), count(*) from xx group by a*(b/1.0); but this one gets the NPE: select a*(b/1.0), count(*) from xx group by a,a*(b/1.0); I've pretty much convinced myself that the problem happens during GroupByNode.init(),
where we take the original PRNode for the core SELECT statement, and re-write that section of the tree to insert a new GroupByNode above that PRNode, and a new PRNode above the GroupByNode, and manipulate the various ResultColumn lists in various ways. But I'm not sure precisely what's going wrong: - is ResultColumnList.cloneMe failing to process sub-expression column references? - are the bindResultColumnToExpression calls in addUnAggColumns supposed to examine the sub-expressions? - or perhaps somewhere else entirely in GroupByNode.init? OK, I think I'm getting close. Note the following:
- select a*(b/1.0), count(*) from xx group by a*(b/1.0); -- WORKS - select a*(b/1.0), count(*) from xx group by a,a*(b/1.0); -- FAILS - select a*(b/1.0), count(*) from xx group by a,b,a*(b/1.0); -- WORKS I think the problem is related to these lines in GroupByNode.java (near line 388 or so): SubstituteExpressionVisitor se = new SubstituteExpressionVisitor( gbc.getColumnExpression(), vc, AggregateNode.class); parent.getResultColumns().accept(se); This code is processing the ResultColumn instances in the "parent" (outer) PRNode to update their ColumnReference values. But (I think) this code only updates the ColumnReference values for those columns which are directly referenced as column references in the grouping list. That's why the references to column "a" in the expression are correctly fixed up, but ther references to column "b" are not, because "b" wasn't directly in the group by list. If true, this suggests a possible workaround: include all columns that are mentioned by any expressions in the group by list redundantly, as simple column references. I'll continue to study GroupByNode.addUnAggColumns, to see if I can construct a possible alternate way to process the columns. Reversing the order of the items in the group by list also avoids the NPE:
- select a*(b/1.0), count(*) from xx group by a*(b/1.0), a; -- WORKS This possibly provides a simpler workaround. My first attempt to fix this was a failure.
I made the repro script pass, but broke various other GROUP BY constructions. I'm attaching the diff anyway (modifyVisitorDoesntWork.diff), because I feel like it's at least going in the right direction. I'm convinced that the problem involves the various manipulations of the result column lists performed by GroupByNode.init, but I'm not yet sure what change to make. I think it's instructive to note that, with this patch applied, the following statement gets "b" and "a" backward: ij> select b, a, count(*) from xx group by b, a; B |A |3 --------------------------------------------------------- 2.0 |3.0 |1 The proper answer, of course, would be (3.0, 2.0, 1) I've been contemplating the following short script, trying to figure out what the
correct behavior of Derby ought to be. Note that I haven't yet thrown in any ORDER BY or HAVING examples; I'm just trying to understand the interaction between the select list and the group by list. Some of these queries work, some are rejected, and one is the Some of the queries which are rejected seem like they ought to work, and some of the queries which work seem like they ought to be rejected. create table test (c1 int, c2 int, c3 int, c4 int); insert into test values (1, 10, 100, 1000); insert into test values (1, 11, 100, 1001); insert into test values (2, 10, 100, 1000); insert into test values (2, 11, 101, 1001); insert into test values (2, 11, 101, 1000); select * from test; select c1+c2, sum(c3) from test group by c1,c2; select c1+c2, sum(c3) from test group by c1+c2; select c1+c2, sum(c3) from test group by c1+c2,c1; select c1,c2, sum(c3) from test group by c1,c2,c1+c2; -- Next query is refused with 42Y30 (select list contains invalid expression): select c1+c2, sum(c3) from test group by c1; -- Next query gets a runtime NPE ( select c1+c2, sum(c3) from test group by c1,c1+c2; -- Next query is refused with 42Y30 (select list contains invalid expression): select c1,c2, sum(c3) from test group by c1+c2,c1; -- Next query is refused with 42Y30 (select list contains invalid expression): select c1+c2, sum(c3) from test group by 1; -- Next query is refused with 42X04 (Column 'EXPR' not in any table) select c1+c2 as expr, sum(c3) from test group by expr; -- Next query is refused with 42X04 (Column 'C1A' not in any table) select c1 as c1a, c2, sum(c3) from test group by c1a,c2; select c1 as c2, sum(c3) from test group by c1,c2; -- Next query is refused with 42Y30 (select list contains invalid expression): select c1 as c2, sum(c3) from test group by c2; select c1 as c2, sum(c3) from test group by c1; The challenge is a) to ensure that we understand what the intended behavior of Derby should be b) to construct a modification to the current head of trunk that resolves the NPE but still preserves all the other desired behaviors. I'm hoping that collecting various test cases like this will help me understand this. GroupByExpressionTest.java contains the following statement:
select (c1+c2)+1, sum(c3) from test group by c1+c2 With the current Derby code, this statement is accepted as legal and produces results. However, I'm not sure it ought to be accepted, as the select list contains a non-aggregate expression which does not appear in the group by list. GroupByExpressionTest.java also contains this very intriguing statement:
select (c1+c2), sum(c3)+(c1+c2) from test group by c1+c2 With the current Derby code, this statement is accepted as legal and produces results. Attached is 'twoPass.diff', a patch proposal.
The idea of this patch is that, during the GroupBy tree rewriting, we replace ResultColumn expressions with VirtualColumnNode references in two passes: - we first handle all expression replacements - then once all the expressions have been replaced, we replace all column references. This patch fixes the repro script, and makes *almost all* of the regression tests pass. However, two of the test cases in GroupByExpressionTest.java fail; these are the two test cases that I listed in my previous two comments. It's not clear to me whether these two test cases are valid or not, they are certainly very unusual expressions and I think an argument can be made that they should be rejected by the compiler as invalid. Please consider this patch as "for discussion", intended to record a possible alternate implementation, and to stimulate some feedback regarding what the correct rules ought to be for correlating elements in the GROUP BY list against elements in the SELECT list. > It's not clear to me whether [two of the test cases in
> GroupByExpressionTest.java] are valid or not, I have not yet had a chance to look at the patch, but I ran those two queries--along with all of the queries posted in your Feb 17 comment--against DB2 v8. From what I can tell, the (clean) Derby trunk and DB2 v8 agree in all of these cases--i.e. they either both throw an error or they return the same results (with the notable exception of the NPE that this issue is trying to fix). Not sure if that's helpful or not... Here's an interesting essay on the sort of subtle problems that can arise
from not being crystal-clear about the semantics of column references and expressions in GROUP BY clauses: http://odetocode.com/Blogs/scott/archive/2005/04/17/1214.aspx The Postgres docs (http://www.postgresql.org/docs/8.0/interactive/sql-select.html#SQL-GROUPBY)
have this to say about the column-name-versus-column-alias subtlety referred to in the previous comment: In case of ambiguity, a GROUP BY name will be interpreted as an input-column name rather than an output column name. It seems to me preferable to reject the ambiguous query rather than producing possibly surprising results, but I can see why ambiguity-resolving behaviors are also worthwhile. Attached is 'TwoPassVisitor.diff', the first working code proposal.
With this patch, the repro script passes, and all regression tests pass. I still need to: - add a bunch of new test cases - add comments to the code - add a writeup explaining the ideas behind this patch But at least I now have code that appears to add the desired new behavior while not breaking any (known) existing behaviors. Thanks Army for checking the queries against DB2, that's quite helpful.
It seems like there might be a similar problem with the HAVING clause. Consider:
ij> select c1+c2, sum(c3) from test group by c1+c2; 1 |2 ----------------------- 11 |100 12 |200 13 |202 3 rows selected ij> select c1+c2, sum(c3) from test group by c1,c1+c2; 1 |2 ----------------------- 11 |100 12 |100 12 |100 13 |202 4 rows selected ij> select c1+c2, sum(c3) from test group by c1+c2 having c1+c2 > 11; 1 |2 ----------------------- 12 |200 13 |202 2 rows selected ij> select c1+c2, sum(c3) from test group by c1, c1+c2 having c1+c2 > 11; ERROR 42X24: Column C2 is referenced in the HAVING clause but is not in the GROUP BY list. I think that the proper results for that last query should have been: 1 |2 ----------------------- 12 |100 12 |100 13 |202 3 rows selected We could either investigate HAVING behaviors as part of this JIRA, or we could open a new one. Attached TwoPassVisitorWithCommentsAndTests.diff,
which adds a brief comment to the GroupByNode code explaining the reasoning behind the two pass ResultColumn rewriting algorithm, and adds a number of new tests to GroupByExpressionTest.java. Thank you for the updated patch, Bryan. Is there a more detailed writeup forthcoming for this change, or are the code comments the extent of it?
The reason I ask is that, after reading the code comments and the comments in this issue, I can't quite see the correlation between the patch and the NPE that it fixes. That is, it's clear that the patch fixes the issue, but I think I'm missing the details on _why_. The code comments say: // then we don't want the replacement of the // simple column reference C1 to affect the // compound expression C1 * (C2 / 100). but it's not clear to me how replacement of the simple column reference can negatively affect the compound expressions. Is it an issue of VCN's pointing to the wrong place? If so, any idea as to why that happens? Similarly, in an earlier comment you noted: > I think it's instructive to note that, with this patch applied, the > following statement gets "b" and "a" backward: > ij> select b, a, count(*) from xx group by b, a; > B |A |3 > --------------------------------------------------------- > 2.0 |3.0 |1 Is it possible to say what it is about processing simple column references _after_ compound expressions that fixes this problem? Apologies if I'm missing something obvious. I'm not by any means opposed to the patch, I'm just hoping to gain an understanding about why it resolves the discussion/issues raised thus far for this issue... And thanks for your continued diligence with this one, Bryan. Attached is TwoPassForHavingClauseAlso.diff, which
addresses the problem I noted with the HAVING clause in an earlier comment. Army, thanks for having a look at the patch. I am still intending to complete a more thorough description of the "why" behind the changes that should hopefully make it more clear. But for now, two quick responses: 1) substituting the column references with VCN's during group-by rewriting changes which result set is pointed to, but when we process the C1 portion of the compound expression but not the compound expression as a whole, we never correct the C2 column reference (since C2 by itself is not a grouping column), leaving the compound expression with C1 pointing to the correct result set, but C2 pointing to a non-existing result set, causing an NPE at runtime. 2) the earlier comment about getting "b" and "a" backward was simply a red herring. That particular patch attempt (modifyVisitorDoesntWork.diff) was simply flat-out wrong. :) I'll try to get a more thorough writeup completed soon. Attached an attempt at a descriptive writeup. Please let me know if it
makes the patch more comprehensible. Thanks a ton for the detailed write-up, Bryan, that definitely makes it clear as to what is going on with this issue.
In reading through the document I gained an understanding of what the problem is, and then I began wondering if this is only an issue when a ColumnReference appears in a compound expression--or is it also an issue when one expression appears within another expression. I came up with the following query: select a*((a+b)/1.0), count(*) from xx group by a+b, a*((a+b)/1.0) which fails with an NPE even after TwoPassForHavingClauseAlso.diff is applied. I'm attaching d3094_followup.htm, which (in theory) walks through this query in the same way that your "notes.html" walked through another example. I did this mostly for my own understanding, but thought it might be useful to post it as a writeup to this issue. To quote the conclusion: "[T]he TwoPassVisitorWithCommentsAndTests.diff solves the specific query that was reported (and described in "notes.html"), and so it makes things better, which is great. Thus I think it would be fine to commit the patch as it is and either deal with queries like the one above in a follow-up patch, or file a separate Jira altogether." Thanks Army! That's a keen insight about the underlying issue, and
a great writeup and test case. Here's yet another variant that's hard to explain: the first statement works, the second is refused with error 42Y30: select (a+b)+c, count(*) from t group by c, a+b; select a+(b+c), count(*) from t group by c, a+b; I'm not quite sure how to push this problem further. The only idea I had was as follows: perhaps we could automatically and silently pull up *every* column reference that is present in any GROUP BY expression, and include that column as an extra column reference at the end of the group by list. So we could rewrite the above queries, for example, as: select (a+b)+c, count(*) from t group by c, a+b, a, b; select a+(b+c), count(*) from t group by c, a+b, a, b; That might mean we'd always have "enough" column references so that we could perform substitution on all the result columns in the select list successfully. Yet I'm nervous about this strategy; we wouldn't want these pulled up columns to be used for the actual grouping (because it could subdivide the actual groups further than the user intended), so we'd have to mark these columns as "special columns to be used for resolving column references, but not to be used for grouping." I'll continue to think about it, but I'm increasingly tempted to take your suggestion to be satisfied with the patch as is, and file some of these other issues separately to be studied more in the future. This is similar to the way that we marked as resolved, while shifting the unresolved portion out to Here's one other idea that might work:
- Define the "level" of a GROUP BY expression to be the number of column references in that expression. For example, in this clause: GROUP BY a, b, a+b, a+(b+c), a*(b/100.0), a*((a+b)/1.0) the level of the various elements in the list are: - a: 1 - b: 1 - a+b: 2 - a+(b+c): 3 - a*(b/100.0): 2 - a*((a+b)/1.0): 3 - Then, when performing VCN substitution into the parent PRNode's ResultColumnList entries from the GroupByNode's entries, process the entires in descending order by level, and left-to-right within the same level. So the entries from the GROUP BY above would be processed in the order: a+(b+c), a*((a+b)/1.0), a+b, a*(b/100.0), a, b Does this algorithm seem like it would be worth investigating? > Here's yet another variant that's hard to explain: the first statement
> works, the second is refused with error 42Y30: > > select (a+b)+c, count(*) from t group by c, a+b; > select a+(b+c), count(*) from t group by c, a+b; The tree for "(a+b)+c" contains the subtree "a+b" as the left-hand side of the top-most binary operator and "c" as the right-hand side. Since each of those matches a stand-alone expression in the GROUP BY, the equivalence checking for GROUP BY processing passes. But the tree for "a+(b+c)" has "a" on one side and "b+c" on the other, neither of which shows up as a stand-alone GROUP BY expression, so the second query is rejected. I agree it's a tad non-inuititive, but it appears to be "explainable", for what that's worth... > Here's one other idea that might work: [ snip ] > Does this algorithm seem like it would be worth investigating? Yes, I definitely think this is worth investigating. From what I can tell it solves the problem of sub-expression substitution, and it seems a tad safer than pulling unneeded columns into the GROUP BY expression. There's a CollectNodesVisitor class which finds all instances of a specified class within the subtree beneath a given ValueNode, so you might be able to use that to count the number of ColumnReferences in the expression. Though ColumnReferences tend to appear in lots of places, so it'll be interesting to see if a simple CollectNodesVisitor is good enough--or will it end up counting ColumnReferences that shouldn't be counted...? In any event, +1 to further pursuit of the suggested approach. Attached is sortExpressions.diff, which proposes to sort
the expressions based on the number of ColumnReferences that each contains, using the technique suggested by Army of counting those refs using the CollectNodesVisitor. It seems to handle all the existing test cases, as well as a few new ones that I added to GroupByExpressionTest.java as part of this patch. Please have a look and tell me what you think. sortExpressions.diff looks good to me. Thanks for continuing with that approach, Bryan. Assuming derbyall and suites.All run cleanly, I say +1 to commit since it fixes all of the known queries and seems algorithmically correct as far as I can tell. If additional issues/queries are discovered later, I think they can be addressed as they are found.
One nit: it seems the creation of a new ArrayList for havingRefsToSubstitute could be conditional upon the presence of a non-null havingClause, and that a single ExpressionSorter() could be used instead of two. But in both cases the code may be easier to read as it is, so I'm not sure it's really worth changing... Thanks Army, those are good suggestions.
I'm intending to move ahead with final testing, cleanup, and commit. Attached is sortExpressionsFinal.diff, which incorporates
the last few suggestions from reviews, as well as some additional comments. derbyall and junit-all were cleaning, so I'm proceeding to commit this patch. Committed sortExpressionsFinal.diff to the trunk as revision 632221.
I'll investigate merging this change back to the 10.3 branch. Until then, I'll leave this issue open. I merged revision 632221 to the 10.3 branch without incident.
My build and test runs were clean, so I committed the change to the 10.3 branch as revision 632779. Marking the issue as resolved. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
DERBY-2352, which may have a different error messagebecause it's in a SANE build and this one is in an INSANE build.