Issue Details (XML | Word | Printable)

Key: DERBY-4071
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Dag H. Wanvik
Reporter: Aaron Digulla
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Derby

AssertFailure when selecting rows from a table with CHARACTER and VARCHAR columns

Created: 25/Feb/09 04:09 PM   Updated: Friday 11:33 PM
Component/s: SQL
Affects Version/s: 10.3.1.4, 10.3.2.1, 10.3.3.0, 10.4.1.3, 10.4.2.0
Fix Version/s: 10.4.2.1, 10.4.3.0, 10.5.1.1, 10.6.0.0

Time Tracking:
Not Specified

File Attachments:
  Size
File Licensed for inclusion in ASF works derby-4071-10_4.diff 2009-03-19 03:44 PM Dag H. Wanvik 5 kB
File Licensed for inclusion in ASF works derby-4071-10_4.stat 2009-03-19 03:44 PM Dag H. Wanvik 0.2 kB
File Licensed for inclusion in ASF works derby-4071.diff 2009-03-10 05:52 AM Dag H. Wanvik 5 kB
File Licensed for inclusion in ASF works derby-4071.stat 2009-03-10 05:52 AM Dag H. Wanvik 0.2 kB
Java Source File Licensed for inclusion in ASF works DerbyTest.java 2009-03-04 10:10 AM Aaron Digulla 9 kB
File Licensed for inclusion in ASF works trialPatch.diff 2009-03-07 04:23 AM Dag H. Wanvik 2 kB
Issue Links:
Duplicate
 
Reference

Issue & fix info: High Value Fix
Bug behavior facts: Regression, Crash
Resolution Date: 19/Mar/09 03:45 PM


 Description  « Hide
When running a complex query on this table:

[code]
Create table DEMO.TEST (
    CHR CHARACTER(26) ,
    VCHR VARCHAR(25) )
[code]

then I get this exception:

AssertFailure: ASSERT FAILED col1.getClass() (class ...SQLChar) expected to be the same as col2.getClass() (class ....SQLVarchar)' was thrown while evaluating an expression.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Aaron Digulla added a comment - 25/Feb/09 04:10 PM
Testcase which demonstrates the behavior

Aaron Digulla made changes - 25/Feb/09 04:10 PM
Field Original Value New Value
Attachment DerbyTest.java [ 12400952 ]
Aaron Digulla added a comment - 25/Feb/09 04:11 PM - edited
This is the query:

[code]
SELECT *
FROM DEMO.TEST S
WHERE S.VCHR IN (
        SELECT VCHR
        FROM DEMO.TEST
        GROUP BY VCHR
        HAVING COUNT (VCHR) > 1
)
  AND CHR NOT IN (
        SELECT MAX(CHR)
        FROM DEMO.TEST T
        WHERE S.VCHR = T.VCHR
        GROUP BY T.VCHR
        HAVING COUNT(T.VCHR) > 1
)
[code]

When you swap the two columns, the errors goes away.

My problem: I'm using Derby to replicate a legacy DB2 database in my unit tests so I can create patches for an existing system. Therefore, I'd like to have a solution for Derby 10.4.2.0 instead of swapping the columns. Is there a chance for a quick fix?

Bryan Pendleton added a comment - 25/Feb/09 04:56 PM
I'm not sure what you mean by "swap the two columns". Can you explain further?

Also, have you experimented with using CAST() to make the data type conversion more explicit?

Aaron Digulla added a comment - 25/Feb/09 06:43 PM
I mean: Make VCHR the first column and CHR the second. Just download the test case, copy it into your Derby project and run it. It contains a "good" case and a "bad" case.

Bryan Pendleton added a comment - 25/Feb/09 08:15 PM
Thanks, I see what you mean now.

And, it fails for me, so thanks very much for the test case!

I'm afraid I don't know what's wrong, though.

Aaron Digulla added a comment - 26/Feb/09 08:21 AM - edited
Good. :) Is it broken in 10.5, too?

I've found another version that works:

[code]
        sql = "SELECT *\r\n" +
                "FROM DEMO.TEST2 S\r\n" +
                "WHERE S.VCHR IN (\r\n" +
                " SELECT VCHR\r\n" +
                " FROM DEMO.TEST2\r\n" +
                " GROUP BY VCHR\r\n" +
                " HAVING COUNT (VCHR) > 1 \r\n" +
                ")\r\n" +
                " AND S.CHR NOT IN (\r\n" + // Table S
                " SELECT MAX(T.CHR)\r\n" + // Table T
                " FROM DEMO.TEST2 T\r\n" +
                " WHERE S.VCHR = T.VCHR\r\n" +
                " GROUP BY T.VCHR\r\n" +
                " HAVING COUNT(T.VCHR) > 1 \r\n" +
                ")";
        i = dump (sql);
        assertEquals (1, i);
[code]

Change: I prefix every CHR column with the table alias. It seems that Derby somehow mixes the columns in the second sub-SELECT.

Aaron Digulla added a comment - 26/Feb/09 08:36 AM
Update: The quick fix doesn't work with my more complex real table. But it might get you on the right track.

Knut Anders Hatlen added a comment - 02/Mar/09 02:23 PM
One possible "quick fix" may be to use the production jar files instead of the debug jar files, as the production jars don't have the assertions.

I don't know if the assert failure exposes a bug or if it's just the assert that is too strict, but this comment indicates that the one who added the assert wasn't completely convinced that it was checking the right thing:

/**
Check that the columns in the row agree with the columns
in the template, both in number and in type.
<p>
XXX (nat) Currently checks that the classes implementing
each column are the same -- is this right?
**/

Aaron Digulla added a comment - 02/Mar/09 03:05 PM
In this case, I think it's unearthing something. If you look at my example, then I'm only comparing columns of the same type, so why is Derby comparing CHAR and VARCHAR internally? This looks like a column mixup.

Or maybe the MAX() casts CHAR to VARCHAR? But why would it do that? Does it also cast INT to LONG? etc.


Bryan Pendleton added a comment - 02/Mar/09 03:18 PM
I suspect you're right Aaron, it's some sort of a column mixup. During compilation, Derby
converts table and column references from their string form in the SQL text into an
internal numeric form which represents columns as ordinal positions into number
result sets in the query plan. Some parts of this conversion are extremely tricky, and
there have been bugs in this area in the past.

In particular, I find it interesting that your queries reference an expression in the HAVING
clause, and that expression does not also appear in the SELECT clause.

Does it change anything if you rewrite your query so that the HAVING expression also
appears in the SELECT list? I think this might require you to introduce yet another level
of sub-selects, so that you'd end up with something like:

    AND CHR NOT IN ( select a from (
        SELECT MAX(CHR) as a, count(t.vchr) as b
        FROM DEMO.TEST T
        WHERE S.VCHR = T.VCHR
        GROUP BY T.VCHR
        HAVING COUNT(T.VCHR) > 1 ) )



Aaron Digulla added a comment - 04/Mar/09 10:09 AM
I tried that and I got lots of other, weird errors. I've extended the test case.

First of all, you need to give the inner select a name or the SQL won't parse.

And after giving it a name, I get an error because S.VCHR is no longer known in the inner select. The final fix is this:

SELECT *
FROM DEMO.TEST S
WHERE S.VCHR IN (
        SELECT T1.VCHR
        FROM DEMO.TEST T1
        GROUP BY T1.VCHR
        HAVING COUNT (T1.VCHR) > 1
)
  AND S.CHR NOT IN ( select x.a from (
        SELECT MAX(T2.CHR) as a, COUNT(T2.VCHR) as b, T2.VCHR as c
        FROM DEMO.TEST T2
        GROUP BY T2.VCHR
        HAVING COUNT(T2.VCHR) > 1
) as x WHERE S.VCHR = x.c )

... drumroll ... which fails with the same error.

Aaron Digulla made changes - 04/Mar/09 10:09 AM
Attachment DerbyTest.java [ 12400952 ]
Aaron Digulla added a comment - 04/Mar/09 10:10 AM
New version of the test matching the comment "Aaron Digulla - 04/Mar/09 02:09 AM"

Aaron Digulla made changes - 04/Mar/09 10:10 AM
Attachment DerbyTest.java [ 12401394 ]
Mike Matrigali made changes - 04/Mar/09 10:15 PM
Derby Categories [Crash] [Crash, High Value Fix]
Dag H. Wanvik added a comment - 07/Mar/09 04:22 AM - edited
I find I can reproduce with this simpler query:

"SELECT * FROM DEMO.TEST S " +
            "WHERE CHR IN ( " +
            " SELECT MAX(CHR) " +
            " FROM DEMO.TEST T " +
            " WHERE S.VCHR = T.VCHR " +
            " GROUP BY T.VCHR " +
            " HAVING COUNT(T.VCHR) > 1 )"

I notice similarities with the erroneous mapArray index in DERBY-3880.

The ProjectRestrictResultSet under the GroupedAggregateResultSet
shares the result colum of the ##aggregate expression for COUNT
with the GroupedAggregateResultSet: That RC contains a VirtualColumnNode
which contains a column reference to the ##UnaggColumn (T.VCHR). When the
mapArray is contructed it picks of the virtual column id of the ##UnaggColumn (1), which
is wrong, since T.VCHR is column 2 in the underlying table.

I upload a hack, trialPatch.diff, which makes the above sample work, to illustrate the issue.
Not a solution, of course :)

The repro's method testAssertFailure doesn't crash now, although the
result assert fails: I do not get an empty result set, just looking
quickly at the data I think the query should return a row.

The repro's test method testParsesButFails now fails with another error (ClassCastException).






Dag H. Wanvik made changes - 07/Mar/09 04:23 AM
Attachment trialPatch.diff [ 12401673 ]

Dag H. Wanvik added a comment - 07/Mar/09 07:18 PM
I found I could simplify the query still further and get the error:

SELECT MAX(CHR) FROM DEMO.TEST T
    GROUP BY T.VCHR
    HAVING COUNT(T.VCHR) > 1

Stack trace extract:
.derby.shared.common.sanity.SanityManager.THROWASSERT(SanityManager.java:147)
.derby.impl.store.access.sort.MergeSort.checkColumnTypes(MergeSort.java:471)
.derby.impl.store.access.sort.MergeInserter.insert(MergeInserter.java:98)
.derby.impl.sql.execute.GroupedAggregateResultSet.loadSorter(GroupedAggregateResultSet.java:308)
.derby.impl.sql.execute.GroupedAggregateResultSet.openCore(GroupedAggregateResultSet.java:180)
.derby.impl.sql.execute.ProjectRestrictResultSet.openCore(ProjectRestrictResultSet.java:168)
.derby.impl.sql.execute.BasicNoPutResultSetImpl.open(BasicNoPutResultSetImpl.java:245)
.derby.impl.sql.GenericPreparedStatement.executeStmt(GenericPreparedStatement.java:416)
.derby.impl.sql.GenericPreparedStatement.execute(GenericPreparedStatement.java:297)
.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1235)

Just wanted to add a little more detail, my previous comment was too
brief, I was getting sleepy :)

The inputRow from the underlying ProjectRestrictResultSet has wrong
type for ##aggregate expression for the count aggregator: CHR in stead
of VCHAR, due to the mapArray containing the wrong virtual column (1)
instead of 2, due to wrong contents in mapArray[2] used by
ProjectRestrictNode ca line 1422, to set up the referenced column
descriptor for runtime. This is picked up by ProjectRestrictResultSet,
ca line 107 with the call to getReferencedColumnPositions.

These two lines in ProjectRestrictNode:

// Map the result columns to the source columns
int[] mapArray = resultColumns.mapSourceColumns();

pick up the virtual column ids from ProjectRestrictNode#resultColumns.
The third of those (index 2), the ##aggregate expression for COUNT, is
the one I referred to in my previous post.

It seems it is the fact that the aggregator input expression (for COUNT) is also the column
we group by (a.k.a. ##UnaggColumn), which is the root of the problem here.. What do you think?
The group by rewriting is tricky..





Dag H. Wanvik added a comment - 07/Mar/09 07:34 PM
The following works, so it seems the HAVING clause
is needed for the problem to surface:

SELECT MAX(CHR), COUNT(T.VCHR) FROM DEMO.TEST T
     GROUP BY T.VCHR

Knut Anders Hatlen added a comment - 07/Mar/09 10:47 PM
Great research, Dag! For the record, I ran some tests and found out that the assert failure first appeared when the fix for DERBY-681 went in.

Knut Anders Hatlen made changes - 07/Mar/09 10:47 PM
Link This issue relates to DERBY-681 [ DERBY-681 ]
Kathey Marsden added a comment - 07/Mar/09 11:46 PM
Marking as regression. Knut found that this was introduced with the fix for DERBY-681

Kathey Marsden made changes - 07/Mar/09 11:46 PM
Derby Info [Regression]
Derby Categories [High Value Fix, Crash] [Crash, High Value Fix]
Dag H. Wanvik made changes - 10/Mar/09 02:43 AM
Assignee Dag H. Wanvik [ dagw ]
Dag H. Wanvik added a comment - 10/Mar/09 05:52 AM
I upload a patch, derby-4071, which solves the immediate problem, I
think. Running regressions now.

It contains a patch to GroupByNode as well as a new test case for
GroupByTest.

It essentially defers the replacement of group by expressions in the
having clause with pointers to the appropriate result column in the
group by node. The replacement used to happen at the end of
GroupByNode#addUnAggColumns. The patch moves substitution to after the
call to GroupByNode#addAggregateColumns has been performed.

Explanation of the error
-----------------------

The substitution is done by a SubstituteExpressionVisitor, which
replaces all occurences of the group by expression as described
above. In our case, however, this group by expression is an argument
to an aggregate function, so the having clause contains an
AggregateNode whose operand is the group by column. When the visitor
gets there, the aggregate node's operand is replaced as described,
cf. UnaryOperatorNode#accept (AggregateNode is a subclass).

There is a snag, however. This AggregateNode is aliased by the
aggregateNode we find in GroupByNode#aggregateVector. And we are not
done with using the information in aggregateVector yet when
addUnAggColumns has run.

Notably, in addAggregateColumns, the information on the aggregates are
needed, and the substitution described above gets in the way:

When constructing the aggregate expression (input) columns, there is a
call to

     aggregate.getNewExpressionResultColumn(dd)

This uses the operand field of the AggregateNode, whose value has just
been replaced for the purposes of the having clause. So, we end up
with a result column for aggregate expression which is wrong. The
result column list of GroupByNode ends up looking like this:

RCL (result column list)
   [0]: #UnaggColumn (the group by column)
      \
       CR
        \
        RC
          \
          VCN
            \
            RC (basetable)
           

   [1]: ##aggregate result
   [2]: ##aggregate expression
      \
      CR
       \
       RC
         \
         VCN
           \
           RCL[0] above, *error* That is, the RC of the group by node!

   [3]: aggregator


If things were OK we would expect to see:


RCL
   [0]: #UnaggColumn (the group by column)
       \
        :
   [1]: ##aggregate result
   [2]: ##aggregate expression
      \
      CR
       \
       RC (underlying ProjectRestrict)
         \
         CR
           \
           RC (bottom ProjectRestrict)
            \
            VCN
              \
              RC (basetable)
   [3]: aggregator

In the the underlying ProjectRestrictNode, which needs to set up the
mapArray to locate the correct column in the underlying base
table, this creates havoc:

The underlying PRN calls RCL.mapSourceColumns for every RC in the
ProjectRestrictNode's RCL (strip off a CR-RC level from GroupByNode's
RCL to picture ProjectRestrictNode's RCL).

mapSourceColumns extracts the virtual column id from a CR or a VCN. For
column 2 of the underlying ProjectRestrictNode, it sees

      [2]:
         \
         VCN
           \
           RCL[0]

and there it finds the virtual column number of 1, where we should
have seen:

      [2]:
         \
         CR
           \
           RC
            \
            VCN
              \
              RC (basetable)

and found the virtual column number 2.

With the patch, the RCLs end up as expected and the repro works.

Since the substitution "damages" the AggregateNode in the
aggregateVector, it struck me that any later usage might also be
affected, even with the patch. The aggregateVector is actually being
used again later, in considerPostOptimizeOptimizations. However, that
code only runs if there is no explicit group by, and max 1 aggregate
function which must be max/min, so I am not sure if this would ever be
an issue. And if so, it would only bar an optimization, not give a
wrong result...

Dag H. Wanvik made changes - 10/Mar/09 05:52 AM
Attachment derby-4071.stat [ 12401807 ]
Attachment derby-4071.diff [ 12401806 ]
Dag H. Wanvik made changes - 10/Mar/09 05:52 AM
Derby Info [Regression] [Patch Available, Regression]
Aaron Digulla added a comment - 10/Mar/09 01:54 PM - edited
Thanks for the patch! I've successfully applied it to the official 10.4.2.0 release (only the GroupByNode.java; the tests fail) and my application works now!

Do I have to close this bug as resolved?

Kristian Waagan added a comment - 10/Mar/09 02:36 PM
Normally the developer resolves the issue when he/she, or the community, feels that the issue has been properly addressed.
Then the reporter should verify the fix and close the issue if the problem is no longer seen.
This isn't a strict process, but what happens most of the time.

As far as I understand, you have already verified that the patch fixes the problem you experienced, but I'm not sure if Dag is done working on this issue yet. The code hasn't been committed to neither the trunk nor the 10.4 branch.
My opinion is that it is a little early to declare victory, but your feedback on the patch is very welcome and may speed up the process :)

Thank you for taking the time to report bugs and testing out patches!

Dag H. Wanvik added a comment - 10/Mar/09 03:55 PM
Regressions passed; please review.

Dag H. Wanvik added a comment - 10/Mar/09 03:59 PM
Thanks for testing the patch, Aaron! Glad it works for 10.4.2 branch also, I didn't get to test that yet.
I will probably back-port the final patch to the 10.4 branch also.

Bryan Pendleton added a comment - 10/Mar/09 04:36 PM
Hi Dag, thanks for the very thorough writeup and notes. Your analysis makes a lot of sense to me.

I'm wondering: with your patch, do we still need the DERBY-3880 patch? That is,
what happens if you apply your patch, and *undo* the DERBY-3880 patch? Do the
DERBY-4071 and DERBY-3880 test cases then pass? Or do we actually need both
patches to get the column references in the HAVING clause expressions to be
mapped correctly?

Dag H. Wanvik added a comment - 10/Mar/09 05:31 PM
Thanks for looking at the patch, Bryan! The DERBY-3880 patch is still needed, I find;
without it the test case testHavingWithInnerJoinDerby3880 starts failing again.


Dag H. Wanvik added a comment - 11/Mar/09 12:44 AM
I have read through DERBY-3880 and also DERBY-3094 and managed to convince myself
that the current issue is related but separate, for what it's worth. Thanks for the suggestion, Bryan.
I wish I had read both those issues (nice write-ups!) better before undertaking this one :)

Repository Revision Date User Message
ASF #754579 Sun Mar 15 01:30:42 UTC 2009 dag DERBY-4071 AssertFailure when selecting rows from a table with CHARACTER and VARCHAR columns

Patch DERBY-4071 which fixes this issue.
Files Changed
MODIFY /db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/GroupByNode.java
MODIFY /db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/tests/lang/GroupByTest.java

Dag H. Wanvik added a comment - 15/Mar/09 01:31 AM
Committed this patch as svn 754579 on trunk.


Dag H. Wanvik made changes - 15/Mar/09 01:31 AM
Derby Info [Regression, Patch Available] [Regression]
Repository Revision Date User Message
ASF #756052 Thu Mar 19 15:43:17 UTC 2009 dag DERBY-4071 AssertFailure when selecting rows from a table with CHARACTER and VARCHAR columns

Patch DERBY-4071-10_4 backport the trunk patch to 10.4 branch.
Files Changed
MODIFY /db/derby/code/branches/10.4/java/engine/org/apache/derby/impl/sql/compile/GroupByNode.java
MODIFY /db/derby/code/branches/10.4/java/testing/org/apache/derbyTesting/functionTests/tests/lang/GroupByTest.java

Dag H. Wanvik added a comment - 19/Mar/09 03:44 PM
Committed derby-4071-10_4 to the 10.4 branch as svn 756052 (It is a simple merge,
but the svn merge command didn't give a clean result this time, so I upload an explicit patch),
resolving.

Dag H. Wanvik made changes - 19/Mar/09 03:44 PM
Attachment derby-4071-10_4.stat [ 12402581 ]
Attachment derby-4071-10_4.diff [ 12402580 ]
Dag H. Wanvik added a comment - 19/Mar/09 03:45 PM
Fixed on 10.4, 10.5 and trunk, so resolving. Aaron, feel free to close the issue now.

Dag H. Wanvik made changes - 19/Mar/09 03:45 PM
Resolution Fixed [ 1 ]
Status Open [ 1 ] Resolved [ 5 ]
Dag H. Wanvik made changes - 19/Mar/09 03:47 PM
Fix Version/s 10.6.0.0 [ 12313727 ]
Fix Version/s 10.5.0.0 [ 12313010 ]
Fix Version/s 10.4.3.0 [ 12313654 ]
Fix Version/s 10.4.2.1 [ 12313401 ]
Aaron Digulla added a comment - 19/Mar/09 08:05 PM
Thanks again for the quick fix!

Aaron Digulla made changes - 19/Mar/09 08:05 PM
Status Resolved [ 5 ] Closed [ 6 ]
Myrna van Lunteren made changes - 04/May/09 06:22 PM
Fix Version/s 10.5.0.0 [ 12313010 ]
Fix Version/s 10.5.1.1 [ 12313771 ]
Knut Anders Hatlen made changes - 25/May/09 06:45 PM
Link This issue is related to DERBY-4247 [ DERBY-4247 ]
Mike Matrigali made changes - 10/Jun/09 09:48 PM
Link This issue is duplicated by DERBY-4247 [ DERBY-4247 ]
Dag H. Wanvik made changes - 30/Jun/09 03:55 PM
Bug behavior facts [High Value Fix, Crash] [Regression]
Dag H. Wanvik made changes - 01/Jul/09 04:24 PM
Issue & fix info [High Value Fix]
Bug behavior facts [Regression] [Crash, Regression]
Dag H. Wanvik added a comment - 20/Nov/09 11:33 PM
This bug is also seen in 10.3 (feel free to reopen and backport if desired).

Dag H. Wanvik made changes - 20/Nov/09 11:33 PM
Affects Version/s 10.3.1.4 [ 12312590 ]
Affects Version/s 10.3.2.1 [ 12312876 ]
Affects Version/s 10.3.3.0 [ 12313142 ]
Affects Version/s 10.4.1.3 [ 12313111 ]