Issue Details (XML | Word | Printable)

Key: DERBY-1624
Type: Improvement Improvement
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Manish Khettry
Reporter: Emmanuel Bernard
Votes: 3
Watchers: 3
Operations

If you were logged in you would be able to see more operations.
Derby

use of direct column name rather than alias make aggregation fail (Hibernate depends on that)

Created: 01/Aug/06 06:00 PM   Updated: 17/Feb/08 08:07 PM
Return to search
Component/s: SQL
Affects Version/s: 10.1.1.0, 10.1.3.1
Fix Version/s: 10.3.1.4

Time Tracking:
Not Specified

File Attachments:
  Size
File Licensed for inclusion in ASF works 1624_repro.sql 2006-10-07 04:48 AM Andrew McIntyre 2 kB
Issue Links:
Reference

Resolution Date: 15/Mar/07 10:45 PM


 Description  « Hide
Error: org.apache.derby.client.am.SqlException: Column 'MODEL0_.COL_0_0_' is either not in any table in the FROM list or appears within a join specification and is outside the scope of the join specification or appears in a HAVING clause and is not in the GROUP BY list. If this is a CREATE or ALTER TABLE statement then 'MODEL0_.COL_0_0_' is not a column in the target table., SQL State: 42X04, Error Code: -1

for

select
        model0_.balance as col_0_0_,
        count(*) as col_1_0_
    from
        account model0_
    group by
        model0_.balance
    having
        count(*) > 1



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Emmanuel Bernard added a comment - 01/Aug/06 06:01 PM
Somehow related to DERBY-127

Emmanuel Bernard added a comment - 01/Aug/06 06:03 PM
All but Derby DB works fine with it

original issue from the hibernate dev list.


I am working on enhancing Derby support a little bit, but have run into
an issue with their syntax that I am unable to figure out. I was hoping
someone on this list was familiar enough with Derby to point me in the
right direction.

Specifically, I am trying to properly deal with the manner in which
Derby (and also DB2 largely) expects columns to be referenced in certain
clauses. For example, because Hibernate always aliases columns in the
select clause, derby requires that those aliases be used in certain
later clauses. The query I am trying to work through right now is as
follows:
    select
        model0_.name as col_0_0_,
        count(*) as col_1_0_
    from
        Model model0_
    group by
        model0_.name
    having
        count(*) > 1

However, I get errors from Derby when passing this to the DB:
ERROR 42X04: Column 'MODEL0_.COL_0_0_' is either not in any table in the
FROM list or appears within a join specification and is outside the
scope of the join specification or appears in a HAVING clause and is not
in the GROUP BY list. If this is a CREATE or ALTER TABLE statement then
'MODEL0_.COL_0_0_' is not a column in the target table.

If the having clause is removed, the query parses fine; I have tried
various incantations regarding how to define the having clause without
avail.

This query seems taken almost verbatim from their reference docs, yet I
cannot get this to work...
http://db.apache.org/derby/docs/10.1/ref/rrefselectexpression.html

Any thoughts?

Manish Khettry added a comment - 16/Aug/06 06:14 PM
This has to do with the way Derby rewrites querys with a groupby. What I find odd is that the query works without the having clause because the same rewrite is done for only group by's. I tracked it down to this bit of code in FromSubquery#findMatching column. So, when we are trying to bind "model0_.name" in the select list why look in different places (case 2 vs case 4) depending on whether there is a having clause or not?!

/* We have 5 cases here:
* 1. ColumnReference was generated to replace an aggregate.
* (We are the wrapper for a HAVING clause and the ColumnReference
* was generated to reference the aggregate which was pushed down into
* the SELECT list in the user's query.)
* Just do what you would expect. Try to resolve the
* ColumnReference against our RCL if the ColumnReference is unqualified
* or if it is qualified with our exposed name.
* 2. We are the wrapper for a GROUP BY and a HAVING clause and
* either the ColumnReference is qualified or it is in
* the HAVING clause. For example:
* select a from t1 group by a having t1.a = 1
* select a as asdf from t1 group by a having a = 1
* We need to match against the underlying FromList and then find
* the grandparent ResultColumn in our RCL so that we return a
* ResultColumn from the correct ResultSetNode. It is okay not to
* find a matching grandparent node. In fact, this is how we ensure
* the correct semantics for ColumnReferences in the HAVING clause
* (which must be bound against the GROUP BY list.)
* 3. We are the wrapper for a HAVING clause without a GROUP BY and
* the ColumnReference is from the HAVING clause. ColumnReferences
* are invalid in this case, so we return null.
* 4. We are the wrapper for a GROUP BY with no HAVING. This has
* to be a separate case because of #5 and the following query:
* select * from (select c1 from t1) t, (select c1 from t1) tt
* group by t1.c1, tt.c1
* (The correlation names are lost in the generated FromSuquery.)
* 5. Everything else - do what you would expect. Try to resolve the
* ColumnReference against our RCL if the ColumnReference is unqualified
* or if it is qualified with our exposed name.
*/
if (columnReference.getGeneratedToReplaceAggregate()) // 1
{
resultColumn = resultColumns.getResultColumn(columnReference.getColumnName());
}
else if (generatedForGroupByClause && generatedForHavingClause &&
columnReference.getClause() != ValueNode.IN_SELECT_LIST) // 2
{
if (SanityManager.DEBUG)
{
SanityManager.ASSERT(correlationName == null,
"correlationName expected to be null");
SanityManager.ASSERT(subquery instanceof SelectNode,
"subquery expected to be instanceof SelectNode, not " +
subquery.getClass().getName());
}

SelectNode select = (SelectNode) subquery;

resultColumn = select.getFromList().bindColumnReference(columnReference);

/* Find and return the matching RC from our RCL.
* (Not an error if no match found. Let ColumnReference deal with it.
*/
if (resultColumn != null)
{
/* Is there a matching resultColumn in the subquery's RCL? */
resultColumn = subquery.getResultColumns().findParentResultColumn(
resultColumn);
if (resultColumn != null)
{
/* Is there a matching resultColumn in our RCL? */
resultColumn = resultColumns.findParentResultColumn(
resultColumn);
}
}
}
else if ((generatedForHavingClause && ! generatedForGroupByClause) // 3
&& (columnReference.getClause() != ValueNode.IN_SELECT_LIST) )
{
resultColumn = null;
}
else if (generatedForGroupByClause) // 4
{
resultColumn = resultColumns.getResultColumn(
columnsTableName,
columnReference.getColumnName());
}
else if (columnsTableName == null || columnsTableName.equals(correlationName)) // 5?
{
resultColumn = resultColumns.getAtMostOneResultColumn(columnReference, correlationName);
}


if (resultColumn != null)
{
columnReference.setTableNumber(tableNumber);
}

return resultColumn;
}

Emmanuel, Is this a big problem for Hibernate with Derby? Others on the derby list who may know this-- how hard is it to ditch the rewrite of groupby/having queries or can we still keep the rewrite and fix this?

Emmanuel Bernard added a comment - 16/Aug/06 06:49 PM
Yes it's a pretty big one :-)
Currently people cannot have query with both having and group by.

From our side, this is clearly not an easy fix. It will require to work quite hard on the AST translator phases, changing the way queries are generated leading to lot's of QA before releasing that.

Steve Ebersole added a comment - 16/Aug/06 07:00 PM
Well first I am not even sure what the "correct syntax" is that Derby is expecting here. But if there is a syntax that actually worked then I could try to morph the query into that form prior to sending it. But that is a long term solution as it would most definitely require some of the HQL translator changes to implement correctly.

Manish Khettry added a comment - 18/Aug/06 01:01 AM
Here is a description of the problem.

Derby rewrite queries involving group by's by adding an outer select and transforming the having clause to a where clause in the outer query, with some modifications. So in this case, given a query like:

select alias.x as c0, count(*) as c1
from foo alias
group by alias.x having count(*) > 0 ;

gets rewritten to:

select * from (select alias.x as c0, count(*) as c1 ... ) where generated_col > 0;

Subsequently, the "*" in the outer query gets expanded to:

select alias.c0, c1 from (....) where generated_col > 0;

To me this seems a bit fishy-- it looks like "alias" is scoped in the subquery and not really visible in the outer query. For this reason, the following query also fails (users have to add a correlation name for subselects).

select alias.c0 from (select alias.x as c0 from foo alias) tabname;

Now the funny thing is that it passes without a having clause, the queyr works and thats because of the rather obfuscated 5 way search for column references in FromSubquery. Notice the huge difference in the search code between cases 2 and 5. BTW, the code I pasted in the previous is not whats in the codeline-- I was playing aroudn with it, so please look at the checked in code).

One fishy thing I came across was the notion of a "clause" in a ValueNode; i.e IN_SELECT_LIST, IN_WHERE_CLAUSE, IN_HAVING_CLAUSE etc. First we only seem to use IN_SELECT_LIST and I do not see (unless I'm missing something) the last two. Second, what exactly are the semantics of the clause instance variable? IN a query like this, to which clause does the expression "expr" belong?

select .. from ... having c0 > (select max(expr) ...);

Is "expr" in a having clause or a select clause? I think answering this would help because the column binding logic depends on this-- look at case 2, again: Should expr be considered to be in a having clause and therefore bound by this bit of code?

else if (generatedForGroupByClause && generatedForHavingClause &&
(columnsTableName != null ||
columnReference.getClause() != ValueNode.IN_SELECT_LIST)) // 2
{

Anyway, I realize that I am posing more questions than providing solutions and that all of this is deep down in the innards of the query parsing/binding code which most of us are only incompletely familiar with but if you have any insights, please update the bug.

I'd hate to see hibernate unusable with Derby due to bugs like this.



Andrew McIntyre added a comment - 05/Oct/06 06:19 AM
Stumbled across this issue whilst searching for something else. Following Manish's query-rewriting example, the original query:

select
        model0_.name as col_0_0_,
        count(*) as col_1_0_
    from
        Model model0_
    group by
        model0_.name
    having
        count(*) > 1

Could be rewritten as:

select * from
    (
        select
            model0_.name as col_0_0_,
            count(*) as col_1_0_
        from
            model model0_
        group by
            model0_.name
    ) as
        model0_ (col_0_0_, col_1_0_)
    where col_1_0_ > 1;

Unless I've missed something (which is likely, btw), I think this reproduces the desired results with the current code. I realize this is hardly ideal, but it seems that all the necessary pieces would be there: the table name, column names, and identical results to what (I think) are expected.

One possible fix would be for the code that rewrites the group-by-with-having predicate as a subquery to push the correlation names in the rewritten subquery out as correlation names for the subquery. After looking at the code in sqlgrammar.jj, this would appear to be a non-trivial, but doable, fix. A little further investigation revealed some discussion about a related issue (DERBY-280) and there is already a JIRA filed for rethinking the parsers handling of queries with GROUP BY/HAVING that is filed as DERBY-681.


Andrew McIntyre added a comment - 07/Oct/06 04:48 AM
While poking at this issue, I came up with a bunch of simple testcases thatI thought should all pass, but out of 18, only 7 pass, and those mostly because they were crafted to avoid this issue and demonstrate the correct results. I don't think I'll have time to work on this any time soon, but I think it demonstrates how broken the result column resolution is for columns with correlation names, especially when there is a HAVING clause (and even one rewrite to a subquery that I thought would work, but didn't).

Anyway, if anyone does pick this up anytime soon, I think the testcases provided here will be a useful guide towards getting things working. It might be useful one to pick up, as well, since it will improve Derby's usability with Hibernate.

Andrew McIntyre added a comment - 15/Mar/07 10:45 PM
Fixed by changes committed for DERBY-681. Will open a new issue for the column aliasing issues remaining in the repro script attached to this issue.