Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
Normal
-
Deviation from standard
Description
Presently, Derby rejects column references in a SELECT .. GROUP BY if the column reference isn't a grouping column. This was correct SQL as per SQL 1992, but later versions of the standard has loosened this to the present wording (e.g. SQL 2003, section 7.12 <query specification>, SR 15:
"If T is a grouped table, then let G be the set of grouping columns of T. In each <value expression> contained in <select list> , each column reference that references a column of T shall reference some column C that is functionally dependent on G or shall be contained in an aggregated argument of a <set function specification> whose aggregation query is QS."
This can be useful in certain queries, cf. the example below culled from http://rpbouman.blogspot.com/2007/05/debunking-group-by-myths.html
SELECT f.film_id,
f.title,
COUNT(fa.actor_id)
FROM film f
LEFT JOIN film_actor fa
ON f.film_id = fa.film_id
GROUP BY f.film_id;
In this case, f.title is functionally dependent on f.film_id (primary key), so the query is correct according to SQL 2003 and later.
Derby requires that f.title also be specified as a grouping column, which is more verbose, but can also have performance implications (although I didn't attempt to measure how this could impact Derby yet), at least if the functional dependency analysis is not performed to eliminate the extra grouping column. Do we do any such analysis?
Another example, a generated column would also be functionally dependent on the columns used to compute it, cf. section 4.8.3 "Known functional dependencies in a base table".