|
[
Permalink
| « Hide
]
Aaron Digulla added a comment - 25/Feb/09 04:10 PM
Testcase which demonstrates the behavior
This is the query:
[code] SELECT * FROM DEMO.TEST S WHERE S.VCHR IN ( SELECT VCHR FROM DEMO.TEST GROUP BY VCHR HAVING COUNT (VCHR) > 1 ) AND CHR NOT IN ( SELECT MAX(CHR) FROM DEMO.TEST T WHERE S.VCHR = T.VCHR GROUP BY T.VCHR HAVING COUNT(T.VCHR) > 1 ) [code] When you swap the two columns, the errors goes away. My problem: I'm using Derby to replicate a legacy DB2 database in my unit tests so I can create patches for an existing system. Therefore, I'd like to have a solution for Derby 10.4.2.0 instead of swapping the columns. Is there a chance for a quick fix? I'm not sure what you mean by "swap the two columns". Can you explain further?
Also, have you experimented with using CAST() to make the data type conversion more explicit? I mean: Make VCHR the first column and CHR the second. Just download the test case, copy it into your Derby project and run it. It contains a "good" case and a "bad" case.
Thanks, I see what you mean now.
And, it fails for me, so thanks very much for the test case! I'm afraid I don't know what's wrong, though. Good. :) Is it broken in 10.5, too?
I've found another version that works: [code] sql = "SELECT *\r\n" + "FROM DEMO.TEST2 S\r\n" + "WHERE S.VCHR IN (\r\n" + " SELECT VCHR\r\n" + " FROM DEMO.TEST2\r\n" + " GROUP BY VCHR\r\n" + " HAVING COUNT (VCHR) > 1 \r\n" + ")\r\n" + " AND S.CHR NOT IN (\r\n" + // Table S " SELECT MAX(T.CHR)\r\n" + // Table T " FROM DEMO.TEST2 T\r\n" + " WHERE S.VCHR = T.VCHR\r\n" + " GROUP BY T.VCHR\r\n" + " HAVING COUNT(T.VCHR) > 1 \r\n" + ")"; i = dump (sql); assertEquals (1, i); [code] Change: I prefix every CHR column with the table alias. It seems that Derby somehow mixes the columns in the second sub-SELECT. Update: The quick fix doesn't work with my more complex real table. But it might get you on the right track.
One possible "quick fix" may be to use the production jar files instead of the debug jar files, as the production jars don't have the assertions.
I don't know if the assert failure exposes a bug or if it's just the assert that is too strict, but this comment indicates that the one who added the assert wasn't completely convinced that it was checking the right thing: /** Check that the columns in the row agree with the columns in the template, both in number and in type. <p> XXX (nat) Currently checks that the classes implementing each column are the same -- is this right? **/ In this case, I think it's unearthing something. If you look at my example, then I'm only comparing columns of the same type, so why is Derby comparing CHAR and VARCHAR internally? This looks like a column mixup.
Or maybe the MAX() casts CHAR to VARCHAR? But why would it do that? Does it also cast INT to LONG? etc. I suspect you're right Aaron, it's some sort of a column mixup. During compilation, Derby
converts table and column references from their string form in the SQL text into an internal numeric form which represents columns as ordinal positions into number result sets in the query plan. Some parts of this conversion are extremely tricky, and there have been bugs in this area in the past. In particular, I find it interesting that your queries reference an expression in the HAVING clause, and that expression does not also appear in the SELECT clause. Does it change anything if you rewrite your query so that the HAVING expression also appears in the SELECT list? I think this might require you to introduce yet another level of sub-selects, so that you'd end up with something like: AND CHR NOT IN ( select a from ( SELECT MAX(CHR) as a, count(t.vchr) as b FROM DEMO.TEST T WHERE S.VCHR = T.VCHR GROUP BY T.VCHR HAVING COUNT(T.VCHR) > 1 ) ) I tried that and I got lots of other, weird errors. I've extended the test case.
First of all, you need to give the inner select a name or the SQL won't parse. And after giving it a name, I get an error because S.VCHR is no longer known in the inner select. The final fix is this: SELECT * FROM DEMO.TEST S WHERE S.VCHR IN ( SELECT T1.VCHR FROM DEMO.TEST T1 GROUP BY T1.VCHR HAVING COUNT (T1.VCHR) > 1 ) AND S.CHR NOT IN ( select x.a from ( SELECT MAX(T2.CHR) as a, COUNT(T2.VCHR) as b, T2.VCHR as c FROM DEMO.TEST T2 GROUP BY T2.VCHR HAVING COUNT(T2.VCHR) > 1 ) as x WHERE S.VCHR = x.c ) ... drumroll ... which fails with the same error. New version of the test matching the comment "Aaron Digulla - 04/Mar/09 02:09 AM"
I find I can reproduce with this simpler query:
"SELECT * FROM DEMO.TEST S " + "WHERE CHR IN ( " + " SELECT MAX(CHR) " + " FROM DEMO.TEST T " + " WHERE S.VCHR = T.VCHR " + " GROUP BY T.VCHR " + " HAVING COUNT(T.VCHR) > 1 )" I notice similarities with the erroneous mapArray index in The ProjectRestrictResultSet under the GroupedAggregateResultSet shares the result colum of the ##aggregate expression for COUNT with the GroupedAggregateResultSet: That RC contains a VirtualColumnNode which contains a column reference to the ##UnaggColumn (T.VCHR). When the mapArray is contructed it picks of the virtual column id of the ##UnaggColumn (1), which is wrong, since T.VCHR is column 2 in the underlying table. I upload a hack, trialPatch.diff, which makes the above sample work, to illustrate the issue. Not a solution, of course :) The repro's method testAssertFailure doesn't crash now, although the result assert fails: I do not get an empty result set, just looking quickly at the data I think the query should return a row. The repro's test method testParsesButFails now fails with another error (ClassCastException). Link to relevant comment on
https://issues.apache.org/jira/browse/DERBY-3880?focusedCommentId=12639647#action_12639647 I found I could simplify the query still further and get the error:
SELECT MAX(CHR) FROM DEMO.TEST T GROUP BY T.VCHR HAVING COUNT(T.VCHR) > 1 Stack trace extract: .derby.shared.common.sanity.SanityManager.THROWASSERT(SanityManager.java:147) .derby.impl.store.access.sort.MergeSort.checkColumnTypes(MergeSort.java:471) .derby.impl.store.access.sort.MergeInserter.insert(MergeInserter.java:98) .derby.impl.sql.execute.GroupedAggregateResultSet.loadSorter(GroupedAggregateResultSet.java:308) .derby.impl.sql.execute.GroupedAggregateResultSet.openCore(GroupedAggregateResultSet.java:180) .derby.impl.sql.execute.ProjectRestrictResultSet.openCore(ProjectRestrictResultSet.java:168) .derby.impl.sql.execute.BasicNoPutResultSetImpl.open(BasicNoPutResultSetImpl.java:245) .derby.impl.sql.GenericPreparedStatement.executeStmt(GenericPreparedStatement.java:416) .derby.impl.sql.GenericPreparedStatement.execute(GenericPreparedStatement.java:297) .derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1235) Just wanted to add a little more detail, my previous comment was too brief, I was getting sleepy :) The inputRow from the underlying ProjectRestrictResultSet has wrong type for ##aggregate expression for the count aggregator: CHR in stead of VCHAR, due to the mapArray containing the wrong virtual column (1) instead of 2, due to wrong contents in mapArray[2] used by ProjectRestrictNode ca line 1422, to set up the referenced column descriptor for runtime. This is picked up by ProjectRestrictResultSet, ca line 107 with the call to getReferencedColumnPositions. These two lines in ProjectRestrictNode: // Map the result columns to the source columns int[] mapArray = resultColumns.mapSourceColumns(); pick up the virtual column ids from ProjectRestrictNode#resultColumns. The third of those (index 2), the ##aggregate expression for COUNT, is the one I referred to in my previous post. It seems it is the fact that the aggregator input expression (for COUNT) is also the column we group by (a.k.a. ##UnaggColumn), which is the root of the problem here.. What do you think? The group by rewriting is tricky.. The following works, so it seems the HAVING clause
is needed for the problem to surface: SELECT MAX(CHR), COUNT(T.VCHR) FROM DEMO.TEST T GROUP BY T.VCHR Great research, Dag! For the record, I ran some tests and found out that the assert failure first appeared when the fix for
Marking as regression. Knut found that this was introduced with the fix for
I upload a patch, derby-4071, which solves the immediate problem, I
think. Running regressions now. It contains a patch to GroupByNode as well as a new test case for GroupByTest. It essentially defers the replacement of group by expressions in the having clause with pointers to the appropriate result column in the group by node. The replacement used to happen at the end of GroupByNode#addUnAggColumns. The patch moves substitution to after the call to GroupByNode#addAggregateColumns has been performed. Explanation of the error ----------------------- The substitution is done by a SubstituteExpressionVisitor, which replaces all occurences of the group by expression as described above. In our case, however, this group by expression is an argument to an aggregate function, so the having clause contains an AggregateNode whose operand is the group by column. When the visitor gets there, the aggregate node's operand is replaced as described, cf. UnaryOperatorNode#accept (AggregateNode is a subclass). There is a snag, however. This AggregateNode is aliased by the aggregateNode we find in GroupByNode#aggregateVector. And we are not done with using the information in aggregateVector yet when addUnAggColumns has run. Notably, in addAggregateColumns, the information on the aggregates are needed, and the substitution described above gets in the way: When constructing the aggregate expression (input) columns, there is a call to aggregate.getNewExpressionResultColumn(dd) This uses the operand field of the AggregateNode, whose value has just been replaced for the purposes of the having clause. So, we end up with a result column for aggregate expression which is wrong. The result column list of GroupByNode ends up looking like this: RCL (result column list) [0]: #UnaggColumn (the group by column) \ CR \ RC \ VCN \ RC (basetable) [1]: ##aggregate result [2]: ##aggregate expression \ CR \ RC \ VCN \ RCL[0] above, *error* That is, the RC of the group by node! [3]: aggregator If things were OK we would expect to see: RCL [0]: #UnaggColumn (the group by column) \ : [1]: ##aggregate result [2]: ##aggregate expression \ CR \ RC (underlying ProjectRestrict) \ CR \ RC (bottom ProjectRestrict) \ VCN \ RC (basetable) [3]: aggregator In the the underlying ProjectRestrictNode, which needs to set up the mapArray to locate the correct column in the underlying base table, this creates havoc: The underlying PRN calls RCL.mapSourceColumns for every RC in the ProjectRestrictNode's RCL (strip off a CR-RC level from GroupByNode's RCL to picture ProjectRestrictNode's RCL). mapSourceColumns extracts the virtual column id from a CR or a VCN. For column 2 of the underlying ProjectRestrictNode, it sees [2]: \ VCN \ RCL[0] and there it finds the virtual column number of 1, where we should have seen: [2]: \ CR \ RC \ VCN \ RC (basetable) and found the virtual column number 2. With the patch, the RCLs end up as expected and the repro works. Since the substitution "damages" the AggregateNode in the aggregateVector, it struck me that any later usage might also be affected, even with the patch. The aggregateVector is actually being used again later, in considerPostOptimizeOptimizations. However, that code only runs if there is no explicit group by, and max 1 aggregate function which must be max/min, so I am not sure if this would ever be an issue. And if so, it would only bar an optimization, not give a wrong result... Thanks for the patch! I've successfully applied it to the official 10.4.2.0 release (only the GroupByNode.java; the tests fail) and my application works now!
Do I have to close this bug as resolved? Normally the developer resolves the issue when he/she, or the community, feels that the issue has been properly addressed.
Then the reporter should verify the fix and close the issue if the problem is no longer seen. This isn't a strict process, but what happens most of the time. As far as I understand, you have already verified that the patch fixes the problem you experienced, but I'm not sure if Dag is done working on this issue yet. The code hasn't been committed to neither the trunk nor the 10.4 branch. My opinion is that it is a little early to declare victory, but your feedback on the patch is very welcome and may speed up the process :) Thank you for taking the time to report bugs and testing out patches! Regressions passed; please review.
Thanks for testing the patch, Aaron! Glad it works for 10.4.2 branch also, I didn't get to test that yet.
I will probably back-port the final patch to the 10.4 branch also. Hi Dag, thanks for the very thorough writeup and notes. Your analysis makes a lot of sense to me.
I'm wondering: with your patch, do we still need the what happens if you apply your patch, and *undo* the patches to get the column references in the HAVING clause expressions to be mapped correctly? Thanks for looking at the patch, Bryan! The
without it the test case testHavingWithInnerJoinDerby3880 starts failing again. I have read through
that the current issue is related but separate, for what it's worth. Thanks for the suggestion, Bryan. I wish I had read both those issues (nice write-ups!) better before undertaking this one :) Committed this patch as svn 754579 on trunk.
Committed derby-4071-10_4 to the 10.4 branch as svn 756052 (It is a simple merge,
but the svn merge command didn't give a clean result this time, so I upload an explicit patch), resolving. Fixed on 10.4, 10.5 and trunk, so resolving. Aaron, feel free to close the issue now.
Thanks again for the quick fix!
This bug is also seen in 10.3 (feel free to reopen and backport if desired).
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||