Derby
  1. Derby
  2. DERBY-6009

Need stricter checking of ORDER BY clause in VALUES expressions

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 10.2.1.6, 10.2.2.0, 10.3.1.4, 10.3.2.1, 10.3.3.0, 10.4.1.3, 10.4.2.0, 10.5.1.1, 10.5.2.0, 10.5.3.0, 10.6.1.0, 10.6.2.1, 10.7.1.1, 10.8.1.2, 10.8.2.2, 10.9.1.0
    • Fix Version/s: 10.10.1.1
    • Component/s: SQL
    • Issue & fix info:
      Repro attached

      Description

      We only support column numbers in ORDER BY clauses in VALUES expression, as seen by this error message:

      ij> values 1,2 order by 1+2;
      ERROR 42878: The ORDER BY clause of a SELECT UNION statement only supports unqualified column references and column position numbers. Other expressions are not currently supported. (errorCode = 30000)

      However, the checks let some unsupported expressions through and produce strange results. For example:

      ij> values 1 order by 1+2;
      1 |2
      -----------------------
      1 |3

      1 row selected

      It should probably have raised the same exception as the first query. And if not, the result should only have had one column.

      And the next example should probably have raised a syntax error too, instead of a NullPointerException:

      ij> values 1 order by int(1);
      ERROR XJ001: Java exception: ': java.lang.NullPointerException'. (errorCode = 0)

      1. derby-6009.diff
        9 kB
        Dag H. Wanvik
      2. derby-6009.stat
        0.5 kB
        Dag H. Wanvik
      3. derby-6009b.diff
        9 kB
        Dag H. Wanvik
      4. derby-6009b.stat
        0.5 kB
        Dag H. Wanvik
      5. derby-6009c.diff
        9 kB
        Dag H. Wanvik
      6. derby-6009c.stat
        0.5 kB
        Dag H. Wanvik

        Issue Links

          Activity

          Hide
          Kathey Marsden added a comment -

          I agree this does not look worthwhile to backport.

          Show
          Kathey Marsden added a comment - I agree this does not look worthwhile to backport.
          Hide
          Dag H. Wanvik added a comment -

          Candidate for back-porting, but it may be such a corner case it's not worth it.

          Show
          Dag H. Wanvik added a comment - Candidate for back-porting, but it may be such a corner case it's not worth it.
          Hide
          Dag H. Wanvik added a comment -

          Thanks, Knut. Uploading version c of this patch, which makes the method setOrderingWhat into a standard node "init" method (the explicit constructor method called by the node factory) with Javadoc explaining the parameter. Also added a comment as you suggest in OrderByColumn.
          Committed as svn 1428160.

          Show
          Dag H. Wanvik added a comment - Thanks, Knut. Uploading version c of this patch, which makes the method setOrderingWhat into a standard node "init" method (the explicit constructor method called by the node factory) with Javadoc explaining the parameter. Also added a comment as you suggest in OrderByColumn. Committed as svn 1428160.
          Hide
          Knut Anders Hatlen added a comment -

          The proposal sounds fine to me too.

          It might be worthwhile adding a short comment to the new check in OrderByColumn. Something like "In VALUES expressions, only allow ordering by column numbers" would suffice. And a javadoc comment describing OrderByList.setOrderingWhat() might also be in order, as the method name itself is a bit cryptic.

          Show
          Knut Anders Hatlen added a comment - The proposal sounds fine to me too. It might be worthwhile adding a short comment to the new check in OrderByColumn. Something like "In VALUES expressions, only allow ordering by column numbers" would suffice. And a javadoc comment describing OrderByList.setOrderingWhat() might also be in order, as the method name itself is a bit cryptic.
          Hide
          Dag H. Wanvik added a comment -

          Thanks, Bryan. Yes, it is indeed painful, especially when the old and the new features make code semantically ambiguous as it does here.. I wish we could remove this, but I suspect users would find that painful; other databases also retain the column number ordering, e.g. I checked and found it in PostgreSQL, Oracle and DB2 at least, so the standard's removal of this hasn't been a resounding success... I don't know if there is such a process, I suspect that's the kind of this they leave to implementers...

          Show
          Dag H. Wanvik added a comment - Thanks, Bryan. Yes, it is indeed painful, especially when the old and the new features make code semantically ambiguous as it does here.. I wish we could remove this, but I suspect users would find that painful; other databases also retain the column number ordering, e.g. I checked and found it in PostgreSQL, Oracle and DB2 at least, so the standard's removal of this hasn't been a resounding success... I don't know if there is such a process, I suspect that's the kind of this they leave to implementers...
          Hide
          Bryan Pendleton added a comment -

          Dag, your proposal sounds fine to me; I think that would be good behavior. Isn't it amazing how
          much trouble it is to retain ancient, no-longer-wanted features of a language, such as the ability
          to reference columns by ordinal position? I wonder if the SQL standard defines a process for
          deprecating unwanted features over time. We could start to build our own list of features we'd
          be willing to deprecate, should that ever become possible...

          Show
          Bryan Pendleton added a comment - Dag, your proposal sounds fine to me; I think that would be good behavior. Isn't it amazing how much trouble it is to retain ancient, no-longer-wanted features of a language, such as the ability to reference columns by ordinal position? I wonder if the SQL standard defines a process for deprecating unwanted features over time. We could start to build our own list of features we'd be willing to deprecate, should that ever become possible...
          Hide
          Dag H. Wanvik added a comment -

          Filed DERBY-6027 for the NPE issue and linking.

          Show
          Dag H. Wanvik added a comment - Filed DERBY-6027 for the NPE issue and linking.
          Hide
          Dag H. Wanvik added a comment -

          Now, with our understanding that a constant expression allows arbitrary (implementation dependent, not implementation defined, ordering), do we think it has any value to open up for this for the VALUES clause? I continue to think it would be ok to give an error message (but with the improved error message i propose in my patch). For SELECT, we should continue to allow it (it's the standard; upwards compatibility). For set operations the current restrictions continue to apply, cf DERBY-2459.

          Opinions?

          Show
          Dag H. Wanvik added a comment - Now, with our understanding that a constant expression allows arbitrary (implementation dependent , not implementation defined, ordering), do we think it has any value to open up for this for the VALUES clause? I continue to think it would be ok to give an error message (but with the improved error message i propose in my patch). For SELECT, we should continue to allow it (it's the standard; upwards compatibility). For set operations the current restrictions continue to apply, cf DERBY-2459 . Opinions?
          Hide
          Dag H. Wanvik added a comment -

          Thanks for looking at this one, Knut. The current patch does throw 1 and 2). I'll file another bug for 3) since it's a different issue. Btw, it happens for SELECT, too. Once, that's fixed, it should throw error on 3) due to the new patch here.

          [The NPE happens when pulling up the order by expression seemingly due to a compiler phase problem:

          Caused by: java.lang.NullPointerException
          at org.apache.derby.impl.sql.compile.CastNode.getConstantValueAsObject(CastNode.java:851)
          at org.apache.derby.impl.sql.compile.OrderByColumn.isReferedColByNum(OrderByColumn.java:466)
          at org.apache.derby.impl.sql.compile.OrderByColumn.pullUpOrderByColumn(OrderByColumn.java:403)
          at org.apache.derby.impl.sql.compile.OrderByList.pullUpOrderByColumns(OrderByList.java:195)
          at org.apache.derby.impl.sql.compile.CursorNode.bindStatement(CursorNode.java:254)

          The variable sourceCTI is (still) null; being set by CastNode# bindCastNodeOnly, which presumably hasn't yet been run.
          ]

          Uploading version b of this patch; there was a trailing a blanks issue in the canon for the orderby.sql script, otherwise the regressions passed without error.

          Show
          Dag H. Wanvik added a comment - Thanks for looking at this one, Knut. The current patch does throw 1 and 2). I'll file another bug for 3) since it's a different issue. Btw, it happens for SELECT, too. Once, that's fixed, it should throw error on 3) due to the new patch here. [The NPE happens when pulling up the order by expression seemingly due to a compiler phase problem: Caused by: java.lang.NullPointerException at org.apache.derby.impl.sql.compile.CastNode.getConstantValueAsObject(CastNode.java:851) at org.apache.derby.impl.sql.compile.OrderByColumn.isReferedColByNum(OrderByColumn.java:466) at org.apache.derby.impl.sql.compile.OrderByColumn.pullUpOrderByColumn(OrderByColumn.java:403) at org.apache.derby.impl.sql.compile.OrderByList.pullUpOrderByColumns(OrderByList.java:195) at org.apache.derby.impl.sql.compile.CursorNode.bindStatement(CursorNode.java:254) The variable sourceCTI is (still) null; being set by CastNode# bindCastNodeOnly, which presumably hasn't yet been run. ] Uploading version b of this patch; there was a trailing a blanks issue in the canon for the orderby.sql script, otherwise the regressions passed without error.
          Hide
          Knut Anders Hatlen added a comment -

          Thanks for investigating, Dag.

          > For the record, ordering by column numbers is non-standard, it was last
          > allowed in SQL-92. Since newer versions of the standard allow expressions,
          > there is a semantic ambiguity when a single integer is used: Derby considers
          > it to be a column number only if it is a a single integer literal,
          > otherwise the expression is treated as a post-92 value expression.

          So in the case of ORDER BY 1+2, the sort key is the expression 1+2. That is,
          the value 3. Not column #1 + column #2, and not column #3. And since the sort
          key evaluates to the exact same value for every row, any ordering is OK.

          I think this means the SELECT statements listed above are OK.

          As to the VALUES statements, we have:

          1) VALUES 1,2 ORDER BY 1+2: Currently throws a syntax error. It could continue
          to do that, as it's not particularly useful to order results by a constant, or
          we could allow it and return the values 1 and 2 in no particular order.

          2) VALUES 1 ORDER BY 1+2: Currently returns one row with two columns. If we
          allow (1), this query should return a single row with a single
          column. Otherwise, it should raise the same error as (1).

          3) VALUES 1 ORDER BY INT(1): Currently throws NullPointerException. If (1) and
          (2) raise syntax errors, so should this query. Otherwise, it should return a
          single row with a single column.

          Show
          Knut Anders Hatlen added a comment - Thanks for investigating, Dag. > For the record, ordering by column numbers is non-standard, it was last > allowed in SQL-92. Since newer versions of the standard allow expressions, > there is a semantic ambiguity when a single integer is used: Derby considers > it to be a column number only if it is a a single integer literal, > otherwise the expression is treated as a post-92 value expression. So in the case of ORDER BY 1+2, the sort key is the expression 1+2. That is, the value 3. Not column #1 + column #2, and not column #3. And since the sort key evaluates to the exact same value for every row, any ordering is OK. I think this means the SELECT statements listed above are OK. As to the VALUES statements, we have: 1) VALUES 1,2 ORDER BY 1+2: Currently throws a syntax error. It could continue to do that, as it's not particularly useful to order results by a constant, or we could allow it and return the values 1 and 2 in no particular order. 2) VALUES 1 ORDER BY 1+2: Currently returns one row with two columns. If we allow (1), this query should return a single row with a single column. Otherwise, it should raise the same error as (1). 3) VALUES 1 ORDER BY INT(1): Currently throws NullPointerException. If (1) and (2) raise syntax errors, so should this query. Otherwise, it should return a single row with a single column.
          Hide
          Dag H. Wanvik added a comment -

          Attaching a patch which adds info in the OrderByList on whether it's attached to a table value constructor or not. If it is, we throw a new exception [1] if the order by expression isn't a column number. Added new tests; running regressions.

          [1] ERROR 4287B: "In this context, the ORDER BY clause may only specify a column number."

          Show
          Dag H. Wanvik added a comment - Attaching a patch which adds info in the OrderByList on whether it's attached to a table value constructor or not. If it is, we throw a new exception [1] if the order by expression isn't a column number. Added new tests; running regressions. [1] ERROR 4287B: "In this context, the ORDER BY clause may only specify a column number."
          Hide
          Dag H. Wanvik added a comment -

          For the record, ordering by column numbers is non-standard, it was last allowed in SQL-92. Since newer versions of the standard allow expressions, there is a semantic ambiguity when a single integer is used: Derby considers it to be a column number only if it is a a single integer literal, otherwise the expression is treated as a post-92 value expression.

          Show
          Dag H. Wanvik added a comment - For the record, ordering by column numbers is non-standard, it was last allowed in SQL-92. Since newer versions of the standard allow expressions, there is a semantic ambiguity when a single integer is used: Derby considers it to be a column number only if it is a a single integer literal, otherwise the expression is treated as a post-92 value expression.
          Hide
          Dag H. Wanvik added a comment - - edited

          For "values 1,2 order by 1+2", the code the throws the exception was added as part of DERBY-2459 "Ordering on a CASE-expression casues a NullPointerException when using a UNION". That patch may or may not have considered the union node's rôle in the VALUES clause, I'll investigate a bit.

          [Update:] This seems not to have been the case; the existing tests checks for the fact that added pulled up order by columns (as seen in the presence of expressions), would get removed when the set operator's RCL is constructed (cf Bryan's analysis in DERBY-2459), i.e. resultCol would be null:

          if (resultCol == null)
          throw StandardException.newException(SQLState.LANG_UNION_ORDER_BY);

          I think I'll avoid touching this mechanism for the VALUES checking. The text of the error message is also a bit misleading: the fact that table value constructors are implemented using UNION would be unknown to most users.

          Show
          Dag H. Wanvik added a comment - - edited For "values 1,2 order by 1+2", the code the throws the exception was added as part of DERBY-2459 "Ordering on a CASE-expression casues a NullPointerException when using a UNION". That patch may or may not have considered the union node's rôle in the VALUES clause, I'll investigate a bit. [Update:] This seems not to have been the case; the existing tests checks for the fact that added pulled up order by columns (as seen in the presence of expressions), would get removed when the set operator's RCL is constructed (cf Bryan's analysis in DERBY-2459 ), i.e. resultCol would be null: if (resultCol == null) throw StandardException.newException(SQLState.LANG_UNION_ORDER_BY); I think I'll avoid touching this mechanism for the VALUES checking. The text of the error message is also a bit misleading: the fact that table value constructors are implemented using UNION would be unknown to most users.
          Hide
          Dag H. Wanvik added a comment -

          It seems the standard allows the sort key to not determine the ordering. I also find no requirement that a column reference be present. Also cf. SQL 2011 section 10.10 general rules clause i): "i) Two rows that are not distinct with respect to the <sort specification>s are said to be peers of each other. The relative ordering of peers is implementation-dependent." I guess we shouldn't touch this behavior for SELECT anyway.

          Show
          Dag H. Wanvik added a comment - It seems the standard allows the sort key to not determine the ordering. I also find no requirement that a column reference be present. Also cf. SQL 2011 section 10.10 general rules clause i): "i) Two rows that are not distinct with respect to the <sort specification>s are said to be peers of each other. The relative ordering of peers is implementation-dependent." I guess we shouldn't touch this behavior for SELECT anyway.
          Hide
          Dag H. Wanvik added a comment -

          One approach would be to forbid order by expression if the expression does not contain a column reference or is a simple integer constant.
          It seems that even if the expression evaluates to a valid column number the result is wrong, so I think we should just flag those:

          > select i from t order by i; – column 1
          1
          2
          3

          > select i from t order by 1+0; – i.e. column 1
          2
          3
          1

          If a column is referenced it works as expected, though:
          > select i from t order by i+0
          1
          2
          3

          > select i from t order by -i+0
          3
          2
          1

          Can anyone think of a valid use cases for expressions that do not reference columns? I'll have a look at what the standard says on this.

          Show
          Dag H. Wanvik added a comment - One approach would be to forbid order by expression if the expression does not contain a column reference or is a simple integer constant. It seems that even if the expression evaluates to a valid column number the result is wrong, so I think we should just flag those: > select i from t order by i; – column 1 1 2 3 > select i from t order by 1+0; – i.e. column 1 2 3 1 If a column is referenced it works as expected, though: > select i from t order by i+0 1 2 3 > select i from t order by -i+0 3 2 1 Can anyone think of a valid use cases for expressions that do not reference columns? I'll have a look at what the standard says on this.
          Hide
          Dag H. Wanvik added a comment -

          There are more anomalies, this time with a select, cf. this statement:

          >create table t(i int, j int, k float);
          >insert into t values (1,1,1.0),(3,9,9.0),(2,4,4.0);
          >select * from t;
          I |J |K
          ----------------------------------------------
          1 |1 |1.0
          3 |9 |9.0
          2 |4 |4.0

          >select * from t order by 3+1;
          I |J |K
          ----------------------------------------------
          2 |4 |4.0
          3 |9 |9.0
          1 |1 |1.0

          So, the constant expression (which even if evaluated) would give a number of a non-present column, #4, isn't flagged, and strangely, the rows are returned backwards.

          Show
          Dag H. Wanvik added a comment - There are more anomalies, this time with a select, cf. this statement: >create table t(i int, j int, k float); >insert into t values (1,1,1.0),(3,9,9.0),(2,4,4.0); >select * from t; I |J |K ---------------------------------------------- 1 |1 |1.0 3 |9 |9.0 2 |4 |4.0 >select * from t order by 3+1; I |J |K ---------------------------------------------- 2 |4 |4.0 3 |9 |9.0 1 |1 |1.0 So, the constant expression (which even if evaluated) would give a number of a non-present column, #4, isn't flagged, and strangely, the rows are returned backwards.

            People

            • Assignee:
              Dag H. Wanvik
              Reporter:
              Knut Anders Hatlen
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development