|
I agree, this was definitely caused by the
A possible workaround would be to include the order by expressions into the select list. That is, modify the query from: select distinct airline_full from airlines order by lower(airline_full); to select distinct airline_full, lower(airline_full) from airlines order by lower(airline_full); Changing the issue type to bug, since it is a regression
I think that these lines in OrderByColumn, which were added by
are the focus of this problem: if (addedColumnOffset >= 0 && target instanceof SelectNode && ( (SelectNode)target ).hasDistinct()) throw StandardException.newException(SQLState.LANG_DISTINCT_ORDER_BY_EXPRESSION); The problem is that these lines should have an additional condition, along the lines of: AND, there exists at least 1 column reference among the column references in this ORDER BY expression, which does not appear as one of the columns which are being selected DISTINCT. The problem is, I'm not sure how to code that AND test I'll have to study the SelectNode data structure to see if I can easily see how to tell "the columns which are being selected DISTINCT". Attached is 'allowExpressions.diff', a patch proposal.
This proposal backs out the line of code added by which was rejecting DISTINCT queries with ORDER BY expressions which did not appear explicitly in the SELECT list. With this patch, Derby once again allows a range of legitimate queries, such as: select distinct name from person order by lower(name) select distinct * from person order by age + 10 Unfortunately, with this patch, Derby now also allows the invalid query: select distinct name from person order by age*2 I haven't been able to quickly find an easy way to distinguish the valid queries from the invalid ones, and so as a short term measure I think it would be good to restore the previous (10.2 and prior) behavior for DISTINCT queries involving ORDER BY expressions. It was good that the queries, but it was far worse that the patch rejected some valid ones. I added a number of new test cases to illustrate the behavior. Please have a look at the patch and let us know what you think! I agree it would probably be good to allow the legal queries once again, even if the expense is allowing an illegal one.
That said we should ultimately reject the bad illegal ones. If that's our understanding, the patch looks ok. The changed test run successfully. If I understand correctly what you're looking for in the removed if statement is something like "if none of the columns in the orderby column list are in the target.rcl, and the target is a distinct query, then throw exception" ? Didn't the removed if statement really said the inverse? "if there is a column in the orderby columns that isn't in the target.rcl, and target is a distinct, then throw exception" Attached is 'mergeWith2351.diff', an update of the previous patch proposal
which resolves the file conflicts with the Hi Thomas, thanks for looking at the patch. I'm sorry I didn't get back to you sooner.
The "if" statement that I wish I could write is something like: if the query specified DISTINCT, and if this ORDER BY expression references any column which is not one of the DISTINCT columns, reject the query, for there may be multiple possible values of that column and we don't know which one to use for ordering the results. Here's a great explanation of the problem: https://issues.apache.org/jira/browse/DERBY-2351?focusedCommentId=12473871#action_12473871 The only difference in this case is that we're concerned with ORDER BY *expressions*, not simple column references, and so an example query might be: SELECT DISTINCT name FROM person ORDER BY age * 2 and we wouldn't know whether to order the results as: John (age 10*2), then Mary (age 20*2) or Mary (age 20*2), then John (age 30*2) What Derby *actually* does, with this patch in place, is to implicitly include (age*2) into the DISTINCT list, so it sees *both* John records and produces the results: John (age 10*2), Mary (age 20*2), John (age 30*2) That is, even though the user specified "DISTINCT name", Derby produces two rows with the same name. Committed the change to the trunk as revision 637529.
I'll investigate merging this change back to the 10.4 branch. I added 10.4 as a fix version. 10.3.2.2 is already listed, so if possible, I think it would be good to merge the fix to 10.3 as well.
Sorry. I changed the wrong issue. Changing the assignee back to Bryan.
Merged the trunk change to the 10.4 branch and committed as revision 639013.
Committed the patch to the 10.3 branch as revision 639935.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
------------------------------------------------------------------------
r555096 | bpendleton | 2007-07-11 00:06:02 +0200 (Wed, 11 Jul 2007) | 19 lines
DERBY-2351: Certain ORDER BY clauses should be rejected as invalidThis change modifies the ORDER BY clause so that it rejects certain
queries as invalid: specifically, queries which:
a) specify the set quantifier DISTINCT,
b) and also contain an ORDER BY clause which refers to a column
or expression which is not in the query result.
The problem with such queries is that we are told to return only
a single instance of the DISTINCT columns, but since the ORDER BY
clause refers to columns which are not in the DISTINCT set, if there
should be multiple candidate rows from which we choose the DISTINCT
result, we don't know which of those rows to use for the ORDER BY
processing.
When the DISTINCT and ORDER BY clauses are in conflict, Derby should
reject the query. This change modifies Derby to do so.
------------------------------------------------------------------------