Summary of Change

An ORDER BY clause of a DISTINCT query which specifies to order by a column which was not in the DISTINCT list is now rejected, because the intent of the query is ambiguous. Previously, Derby instead produced non-distinct results. Also, an ORDER BY clause which specifies a table-name-qualified column alias is now rejected as invalid, where previously it was accepted.

Symptoms Seen by Applications Affected by Change

New rules for DISTINCT and ORDER BY

Applications which specify certain combinations of SELECT DISTINCT with ORDER BY will now receive an error message, whereas formerly such applications received non-distinct results.

As an example, take the following:

create table person (name varchar(10), age int);
insert into person values ('John', 10);
insert into person values ('John', 30);
insert into person values ('Mary', 20);

SELECT DISTINCT name FROM person ORDER BY age;

The query above is now rejected, with the error message:

ERROR 42879: The ORDER BY clause may not contain column 'AGE', since the query specifies DISTINCT and that column does not appear in the query result.

If the AGE column is included in the DISTINCT list in the above query, there is no ambiguity

New column alias rules

Applications which specify a column alias for a column in the SELECT statement, and which specify an ORDER BY clause which specifies that column alias qualified by the table name, will now receive an error indicating that the ORDER BY clause is invalid.

As an example, take the following:

create table t1 (i int, j int);
select t1.id as idcolumn1, t1.id as idcolumn2 from t1 order by t1.idcolumn1, t1.idcolumn2;

This query is now rejected, as there is no column named 'idcolumn1' in table 't1'. The error message is:

ERROR 42X04: Column 'T1.IDCOLUMN1' is either not in any table in the FROM list or appears within a join specification and is outside the scope of the join specification or appears in a HAVING clause and is not in the GROUP BY list. If this is a CREATE or ALTER TABLE statement then 'T1.IDCOLUMN1' is not a column in the target table.

Valid forms of the query above are:

select t1.id as idcolumn1, t1.id as idcolumn2 from t1 order by idcolumn1, idcolumn2;

or

select t1.id as idcolumn1, t1.id as idcolumn2 from t1 order by t1.id, t1.id;

Rationale for Change

When the query ambiguously specifies both DISTINCT and ORDER BY, Derby was unsure whether to return the rows properly ordered, but non-distinct, or to return a distinct set of rows, but in an unknown order. Since no clear resolution of the ambiguity could be found, we chose instead to reject the query.

The rules for resolving column references in ORDER BY clauses have been enhanced to consider column aliases and column names more fully. Derby now uses different resolution rules depending on whether the ORDER BY column reference is table.column, or just column:

Application Changes Required

A query which specifies ordering by a non-distinct column should instead include the ORDER BY column in the DISTINCT list, to resolve the ambiguity about which values of that column should be used to distinctly identify the resulting rows.

A query which specifies table-name.alias-name should be rewritten to specify either simply alias-name, or table-name.column-name.