Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 10.6.1.0
    • Component/s: SQL
    • Labels:
      None
    • Urgency:
      Normal

      Description

      SQL 2008 allows ORDER BY to be specified in subqueries. In conjunction with OFFSET/FETCH and/or ROW_NUMBER
      meaningful subqueries with row ordering may be formulated. Cf. MySQL's LIMIT may be used in subqueries as well.
      Note that OFFSET/FETCH is currently not allowed in subqueries, either.

      1. derby-4397-1.diff
        55 kB
        Dag H. Wanvik
      2. derby-4397-1.stat
        1 kB
        Dag H. Wanvik
      3. derby-4397-2.diff
        50 kB
        Dag H. Wanvik
      4. derby-4397-2.stat
        1 kB
        Dag H. Wanvik
      5. derby-4397-all-subqueries.diff
        70 kB
        Dag H. Wanvik
      6. derby-4397-all-subqueries.stat
        2 kB
        Dag H. Wanvik
      7. derby-4397-insert-from-exists.diff
        59 kB
        Dag H. Wanvik
      8. derby-4397-insert-from-exists.stat
        2 kB
        Dag H. Wanvik
      9. derby-4397-sortavoidance-a.diff
        3 kB
        Dag H. Wanvik
      10. derby-4397-sortavoidance-a.stat
        0.2 kB
        Dag H. Wanvik
      11. orderBySpec.html
        11 kB
        Dag H. Wanvik
      12. orderBySpec.html
        10 kB
        Dag H. Wanvik
      13. orderBySpec.html
        10 kB
        Dag H. Wanvik
      14. orderBySpec.html
        9 kB
        Dag H. Wanvik
      15. orderBySpec.html
        9 kB
        Dag H. Wanvik
      16. orderBySpec.html
        9 kB
        Dag H. Wanvik

        Issue Links

          Activity

          Hide
          Dag H. Wanvik added a comment -

          Attaching a first rev of a functional spec for this issue.

          Show
          Dag H. Wanvik added a comment - Attaching a first rev of a functional spec for this issue.
          Hide
          Knut Anders Hatlen added a comment -

          Thanks for writing the spec, Dag. It generally looks good, but I have a couple of questions:

          1) If I read the syntax copied from the spec correctly, this query should be allowed:

          SELECT * FROM T1 ORDER BY X UNION SELECT * FROM T2 ORDER BY Y

          With the suggested changes to the grammar, I think this query is not allowed unless it's rewritten with parentheses around the sub-queries:

          (SELECT * FROM T1 ORDER BY X) UNION (SELECT * FROM T2 ORDER BY Y)

          2) Do you think this issue will address DERBY-4? If so, it would be good to mention in the spec how INSERT INTO ... SELECT ... ORDER BY is supposed to behave (in particular with respect to identity columns).

          3) The spec mentions that ORDER BY in a sub-select can probably be ignored if there's no windowing or offset/fetch on the same query level. I think that's true (perhaps with the exception if INSERT INTO, see (2)). However, I think the ORDER BY can be ignored even if there is a windowing clause, since the windowing should be applied before the ORDER BY (see DERBY-4069) and would therefore not guarantee that for instance row numbers have the same ordering as the ORDER BY columns.

          4) The spec says that a release note is needed. Are you aware of any backwards-compatibility issues that require special mentioning, or is it sufficient to put it in the list of new features?

          Show
          Knut Anders Hatlen added a comment - Thanks for writing the spec, Dag. It generally looks good, but I have a couple of questions: 1) If I read the syntax copied from the spec correctly, this query should be allowed: SELECT * FROM T1 ORDER BY X UNION SELECT * FROM T2 ORDER BY Y With the suggested changes to the grammar, I think this query is not allowed unless it's rewritten with parentheses around the sub-queries: (SELECT * FROM T1 ORDER BY X) UNION (SELECT * FROM T2 ORDER BY Y) 2) Do you think this issue will address DERBY-4 ? If so, it would be good to mention in the spec how INSERT INTO ... SELECT ... ORDER BY is supposed to behave (in particular with respect to identity columns). 3) The spec mentions that ORDER BY in a sub-select can probably be ignored if there's no windowing or offset/fetch on the same query level. I think that's true (perhaps with the exception if INSERT INTO, see (2)). However, I think the ORDER BY can be ignored even if there is a windowing clause, since the windowing should be applied before the ORDER BY (see DERBY-4069 ) and would therefore not guarantee that for instance row numbers have the same ordering as the ORDER BY columns. 4) The spec says that a release note is needed. Are you aware of any backwards-compatibility issues that require special mentioning, or is it sufficient to put it in the list of new features?
          Hide
          Rick Hillegas added a comment -

          Thanks for the great spec, Dag. Other than some very minor typos, I have only one comment, related to Knut's observation above under (2):

          It would be good to know, either way, whether this work will affect DERBY-4. If this does enable INSERT ... SELECT ... ORDER BY, then it would be good to mention this somewhere in the user guides.

          Thanks,
          -Rick

          Show
          Rick Hillegas added a comment - Thanks for the great spec, Dag. Other than some very minor typos, I have only one comment, related to Knut's observation above under (2): It would be good to know, either way, whether this work will affect DERBY-4 . If this does enable INSERT ... SELECT ... ORDER BY, then it would be good to mention this somewhere in the user guides. Thanks, -Rick
          Hide
          Dag H. Wanvik added a comment -

          Thanks for looking at this, Knut, Rick.
          @Knut
          Re 1), I can't see that ORDER BY should be allowed before a UNION without a (), can you explain how you reach that conclusion? It seems to me that what comes before the UNION must eventually come from a <query primary> which is either a table name or a (parenthesized) subquery..

          2) I hope so, I agree that in an INSERT context, an ORDER BY in the subquery should not be ignored in the presence of identity columns.
          3) Yes that is correct, I was thinking of the case where the ORDER BY is inside a subquery and the containing query contains a window. Btw, hat case may be covered by an ORDER BY inside the window specification (equivalently?) when we support that.
          4) I am not aware of any such backwards-compatibility issues at this point. I added this as a reminder to self to check it

          @Rick
          Good point about the user guide for INSERT. Will add a note on that.

          Show
          Dag H. Wanvik added a comment - Thanks for looking at this, Knut, Rick. @Knut Re 1), I can't see that ORDER BY should be allowed before a UNION without a (), can you explain how you reach that conclusion? It seems to me that what comes before the UNION must eventually come from a <query primary> which is either a table name or a (parenthesized) subquery.. 2) I hope so, I agree that in an INSERT context, an ORDER BY in the subquery should not be ignored in the presence of identity columns. 3) Yes that is correct, I was thinking of the case where the ORDER BY is inside a subquery and the containing query contains a window. Btw, hat case may be covered by an ORDER BY inside the window specification (equivalently?) when we support that. 4) I am not aware of any such backwards-compatibility issues at this point. I added this as a reminder to self to check it @Rick Good point about the user guide for INSERT. Will add a note on that.
          Hide
          Dag H. Wanvik added a comment -

          Uploaded rev 2. of the specification based on comments from Knut and Rick.

          Show
          Dag H. Wanvik added a comment - Uploaded rev 2. of the specification based on comments from Knut and Rick.
          Hide
          Knut Anders Hatlen added a comment -

          Regarding 1), you're of course right, I somehow managed to misread the syntax.

          Show
          Knut Anders Hatlen added a comment - Regarding 1), you're of course right, I somehow managed to misread the syntax.
          Hide
          Dag H. Wanvik added a comment - - edited

          Uploading an experimental patch (derby-4397-insert-from-exists) which

          • incorporates the revamped patch for DERBY-4
          • adds ORDER BY to FromSubqueries
          • adds ORDER BY to EXISTS/NOT EXISTS subqueries
          • adds test cases to the above as OrderByInSubqueries

          The patch is by no means complete; only some subqueries are yet allowed (parser blocks others),
          but the general approach should hopefully extend to the other kinds, since the code added to
          SubqueryNode is pretty generic. I just wanted to post this early in case anybody has input on the general approach taken.

          Show
          Dag H. Wanvik added a comment - - edited Uploading an experimental patch (derby-4397-insert-from-exists) which incorporates the revamped patch for DERBY-4 adds ORDER BY to FromSubqueries adds ORDER BY to EXISTS/NOT EXISTS subqueries adds test cases to the above as OrderByInSubqueries The patch is by no means complete; only some subqueries are yet allowed (parser blocks others), but the general approach should hopefully extend to the other kinds, since the code added to SubqueryNode is pretty generic. I just wanted to post this early in case anybody has input on the general approach taken.
          Hide
          Dag H. Wanvik added a comment -

          Attaching rev. 3 of specification (added SQL standard's feature numbers).

          Show
          Dag H. Wanvik added a comment - Attaching rev. 3 of specification (added SQL standard's feature numbers).
          Hide
          Dag H. Wanvik added a comment -

          Uploading derby-4397-all-subqueries. This patch opens up for ORDER BY in all subqueries as per the spec.
          A previous version of this passed the regression tests. This is still early
          days; I need to do a write-up and add more tests. Feel free to take this for a test drive
          and shoot holes in it

          Show
          Dag H. Wanvik added a comment - Uploading derby-4397-all-subqueries. This patch opens up for ORDER BY in all subqueries as per the spec. A previous version of this passed the regression tests. This is still early days; I need to do a write-up and add more tests. Feel free to take this for a test drive and shoot holes in it
          Hide
          Dag H. Wanvik added a comment -

          Uploading a minor revision of the spec.

          Show
          Dag H. Wanvik added a comment - Uploading a minor revision of the spec.
          Hide
          Dag H. Wanvik added a comment -

          Uploading a tentatively complete version of this patch,
          derby-4397-1. After DERBY-4442 and the other cleanups in the INSERT
          area went in, this patch now also solves DERBY-4, including the
          expected ordering of identity columns.

          Additionally, the INSERT cleanup has made it possible to simplify the
          patch is several places compared with the earlier versions.

          When this patch goes in, we should be ready to start work on allowing
          FETCH/OFFSET in subqueries as well. Please review.

          Patch details:

          M java/engine/org/apache/derby/impl/sql/compile/sqlgrammar.jj

          Wire in syntax for ORDER BY, cf. the specification document.

          M java/engine/org/apache/derby/impl/sql/compile/FromSubquery.java

          Add code to pull up and bind ORDER BY on a "from" subquery and finally
          push it down to underlying result set as part of preprocess.

          Change a call to RCL.size to be RCL.visibleSize, to account for the fact
          that there may now be extra columns pulled up due to ORDER BY.

          M java/engine/org/apache/derby/impl/sql/compile/SubqueryNode.java

          Add code to pull up and bind ORDER BY on a from subquery and finally
          push it down to result set as part of preprocess.

          We also forbid flattening of a subquery if we have an ORDER BY on it.

          The parser transiently uses SubqueryNode before replacing it with a
          FromSubquery node so I added a getter method to retrive the order by
          list to be used in that replacement.

          M java/engine/org/apache/derby/impl/sql/compile/InsertNode.java

          Add code to pull up and bind ORDER BY on a from subquery and finally
          push it down to result set as part of preprocess. For the push, see
          also comments for NormalizeResultSetNode and ProjectRestrictNode.

          M java/engine/org/apache/derby/impl/sql/compile/FromBaseTable.java

          Add logic to fetch the ORDER BY list when we parse parse a view text.
          Next, hand it on to the fromSubquery being constructed for the view.

          M java/engine/org/apache/derby/impl/sql/compile/ResultSetNode.java

          Replace call to size with visibleSize, see above. Add default
          implementation of new RSN method setOrderBy needed by parser for
          primaries to receive an ORDER BY list.

          M java/engine/org/apache/derby/impl/sql/compile/OrderByNode.java

          Adds logic in the generate method to poke the order by list's result
          set number into OrderByNode's result set number.
          We need to take note of result set number if ORDER BY is used in a
          subquery for the case where a PRN is inserted in top of the select's
          PRN to project away a sort column that is not part of the select
          list, e.g.

          select * from (select i from t order by j desc) s

          If the resultSetNumber is not correctly set in our resultColumns,
          code generation for the PRN above us will fail when calling
          resultColumns.generateCore -> VCN.generateExpression, cf. the Sanity
          assert in VCN.generateExpression on sourceResultSetNumber >= 0.

          M java/engine/org/apache/derby/impl/sql/compile/SelectNode.java

          Let flattenableInFromSubquery return false if we have an ORDER BY on a
          subquery select. Also modify the way the PRN is added if we have an
          ORDER BY on a subquery, so that references into the Selects RCL (made
          from above us in the query tree) will not be voided by our adding a
          PRN. The method is the same as used in other instance of this phase
          problem: reuse the same RCL for the PRN and make a new one for the
          select node, cf. for example DERBY-4450.

          M java/engine/org/apache/derby/impl/sql/compile/NormalizeResultSetNode.java
          M java/engine/org/apache/derby/impl/sql/compile/ProjectRestrictNode.java

          Add pushOrderBylist to push through to underlying result set, so it
          will work when pushed from InsertNode, which sometimes may have an
          intervening NormalizeResultSetNode and/or ProjectRestrictNode node
          over the real result set to be ordered when we get to preprocess time.

          M java/engine/org/apache/derby/impl/sql/compile/OrderByList.java

          Add code to remember generated result set number, see changes comments
          for OrderByNode.

          M java/engine/org/apache/derby/impl/sql/compile/CreateViewNode.java

          Add code to hold an ORDER BY list for a view query.

          M java/engine/org/apache/derby/impl/sql/compile/ResultColumnList.java

          Replace instances of size with visibleSize, see above.

          M java/engine/org/apache/derby/impl/sql/compile/AlterTableNode.java

          Trivial fix to printSubNode to make it safer for use in intermediate
          stages of processing.

          M java/engine/org/apache/derby/impl/sql/compile/DMLModStatementNode.java

          Added a missing printSubNodes and made a method static.

          M java/engine/org/apache/derby/impl/sql/compile/QueryTreeNode.java

          Added a debug utility method, stackPrint which prints current run-time
          stack trace on derby.log. Yes, I know, not really part of this patch,
          but I was too lazy to make a separate issue for it.. feel free to kick
          me.

          A java/testing/org/apache/derbyTesting/functionTests/tests/lang/OrderByInSubqueries.java

          Tests for this feature.

          M java/testing/org/apache/derbyTesting/functionTests/tests/lang/orderby.sql
          M java/testing/org/apache/derbyTesting/functionTests/master/orderby.out

          Make negative tests positive (order by in subqueries now allowed).

          Show
          Dag H. Wanvik added a comment - Uploading a tentatively complete version of this patch, derby-4397-1. After DERBY-4442 and the other cleanups in the INSERT area went in, this patch now also solves DERBY-4 , including the expected ordering of identity columns. Additionally, the INSERT cleanup has made it possible to simplify the patch is several places compared with the earlier versions. When this patch goes in, we should be ready to start work on allowing FETCH/OFFSET in subqueries as well. Please review. Patch details: M java/engine/org/apache/derby/impl/sql/compile/sqlgrammar.jj Wire in syntax for ORDER BY, cf. the specification document. M java/engine/org/apache/derby/impl/sql/compile/FromSubquery.java Add code to pull up and bind ORDER BY on a "from" subquery and finally push it down to underlying result set as part of preprocess. Change a call to RCL.size to be RCL.visibleSize, to account for the fact that there may now be extra columns pulled up due to ORDER BY. M java/engine/org/apache/derby/impl/sql/compile/SubqueryNode.java Add code to pull up and bind ORDER BY on a from subquery and finally push it down to result set as part of preprocess. We also forbid flattening of a subquery if we have an ORDER BY on it. The parser transiently uses SubqueryNode before replacing it with a FromSubquery node so I added a getter method to retrive the order by list to be used in that replacement. M java/engine/org/apache/derby/impl/sql/compile/InsertNode.java Add code to pull up and bind ORDER BY on a from subquery and finally push it down to result set as part of preprocess. For the push, see also comments for NormalizeResultSetNode and ProjectRestrictNode. M java/engine/org/apache/derby/impl/sql/compile/FromBaseTable.java Add logic to fetch the ORDER BY list when we parse parse a view text. Next, hand it on to the fromSubquery being constructed for the view. M java/engine/org/apache/derby/impl/sql/compile/ResultSetNode.java Replace call to size with visibleSize, see above. Add default implementation of new RSN method setOrderBy needed by parser for primaries to receive an ORDER BY list. M java/engine/org/apache/derby/impl/sql/compile/OrderByNode.java Adds logic in the generate method to poke the order by list's result set number into OrderByNode's result set number. We need to take note of result set number if ORDER BY is used in a subquery for the case where a PRN is inserted in top of the select's PRN to project away a sort column that is not part of the select list, e.g. select * from (select i from t order by j desc) s If the resultSetNumber is not correctly set in our resultColumns, code generation for the PRN above us will fail when calling resultColumns.generateCore -> VCN.generateExpression, cf. the Sanity assert in VCN.generateExpression on sourceResultSetNumber >= 0. M java/engine/org/apache/derby/impl/sql/compile/SelectNode.java Let flattenableInFromSubquery return false if we have an ORDER BY on a subquery select. Also modify the way the PRN is added if we have an ORDER BY on a subquery, so that references into the Selects RCL (made from above us in the query tree) will not be voided by our adding a PRN. The method is the same as used in other instance of this phase problem: reuse the same RCL for the PRN and make a new one for the select node, cf. for example DERBY-4450 . M java/engine/org/apache/derby/impl/sql/compile/NormalizeResultSetNode.java M java/engine/org/apache/derby/impl/sql/compile/ProjectRestrictNode.java Add pushOrderBylist to push through to underlying result set, so it will work when pushed from InsertNode, which sometimes may have an intervening NormalizeResultSetNode and/or ProjectRestrictNode node over the real result set to be ordered when we get to preprocess time. M java/engine/org/apache/derby/impl/sql/compile/OrderByList.java Add code to remember generated result set number, see changes comments for OrderByNode. M java/engine/org/apache/derby/impl/sql/compile/CreateViewNode.java Add code to hold an ORDER BY list for a view query. M java/engine/org/apache/derby/impl/sql/compile/ResultColumnList.java Replace instances of size with visibleSize, see above. M java/engine/org/apache/derby/impl/sql/compile/AlterTableNode.java Trivial fix to printSubNode to make it safer for use in intermediate stages of processing. M java/engine/org/apache/derby/impl/sql/compile/DMLModStatementNode.java Added a missing printSubNodes and made a method static. M java/engine/org/apache/derby/impl/sql/compile/QueryTreeNode.java Added a debug utility method, stackPrint which prints current run-time stack trace on derby.log. Yes, I know, not really part of this patch, but I was too lazy to make a separate issue for it.. feel free to kick me. A java/testing/org/apache/derbyTesting/functionTests/tests/lang/OrderByInSubqueries.java Tests for this feature. M java/testing/org/apache/derbyTesting/functionTests/tests/lang/orderby.sql M java/testing/org/apache/derbyTesting/functionTests/master/orderby.out Make negative tests positive (order by in subqueries now allowed).
          Hide
          Dag H. Wanvik added a comment -

          Regressions passed with the latest patch.

          Show
          Dag H. Wanvik added a comment - Regressions passed with the latest patch.
          Hide
          Bryan Pendleton added a comment -

          Dag, thank you very much for working on this project. I think this will be a great new
          feature, and it will be very nice to resolve DERBY-4 as well, since it has been requested
          by a number of users.

          I read through the patch and it looks very thorough and complete; the code was
          clear and well-commented.

          I think we should file a sub-task of this issue to update the documentation for
          query flattening, to add the new restriction that a sub-query with an ORDER BY
          cannot be flattened.

          Show
          Bryan Pendleton added a comment - Dag, thank you very much for working on this project. I think this will be a great new feature, and it will be very nice to resolve DERBY-4 as well, since it has been requested by a number of users. I read through the patch and it looks very thorough and complete; the code was clear and well-commented. I think we should file a sub-task of this issue to update the documentation for query flattening, to add the new restriction that a sub-query with an ORDER BY cannot be flattened.
          Hide
          Dag H. Wanvik added a comment -

          Thanks for looking at the patch, Bryan! I upload a new version (5.0) of the specification which adds the doc item you suggest. I will file a JIRA for the docs in due course.

          Show
          Dag H. Wanvik added a comment - Thanks for looking at the patch, Bryan! I upload a new version (5.0) of the specification which adds the doc item you suggest. I will file a JIRA for the docs in due course.
          Hide
          Dag H. Wanvik added a comment -

          Committed derby-4397-2 as svn 891952. This version of the patch
          is the same as #1, except some cleanup in the new test, OrderByInSubqueries.
          I am not totally comfortable with the level of testing for this feature yet, but I will probably need to get OFFSET/FETCH in subqueries working before I can tease out
          more bugs, so I chose to commit this patch now.

          Show
          Dag H. Wanvik added a comment - Committed derby-4397-2 as svn 891952. This version of the patch is the same as #1, except some cleanup in the new test, OrderByInSubqueries. I am not totally comfortable with the level of testing for this feature yet, but I will probably need to get OFFSET/FETCH in subqueries working before I can tease out more bugs, so I chose to commit this patch now.
          Hide
          Dag H. Wanvik added a comment -

          Found that sort avoidance fails in a FromSubquery:
          If this query avoids sorting due to an index on i:

          select i from t order by i

          one would expect this to avoid it as well:

          select * from (select i from t order by i) t

          but I see that sorting is performed in the latter case. It would be nice to make it work in the latter case as well..

          Show
          Dag H. Wanvik added a comment - Found that sort avoidance fails in a FromSubquery: If this query avoids sorting due to an index on i: select i from t order by i one would expect this to avoid it as well: select * from (select i from t order by i) t but I see that sorting is performed in the latter case. It would be nice to make it work in the latter case as well..
          Hide
          Dag H. Wanvik added a comment -

          The issue is that OrderByList.sortRequired will force sorting if it does not find a column reference in the OrderByColumn's result column's expression tree, cf. code ca line 477 in OrderByList.java.
          In the latter query, I notice that at this point in the code, we have a VirtualColumnNode instead.

          Show
          Dag H. Wanvik added a comment - The issue is that OrderByList.sortRequired will force sorting if it does not find a column reference in the OrderByColumn's result column's expression tree, cf. code ca line 477 in OrderByList.java. In the latter query, I notice that at this point in the code, we have a VirtualColumnNode instead.
          Hide
          Dag H. Wanvik added a comment -

          The reason we see a VirtualColumnNode in the OrderByColumn is that FromSubquery changes the result column list of its subquery after we have bound the order by list, cf. this comment in the code:

          /* Now that we've bound the expressions in the subquery, we

          • can propagate the subquery's RCL up to the FromSubquery.
          • Get the subquery's RCL, assign shallow copy back to
          • it and create new VirtualColumnNodes for the original's
          • ResultColumn.expressions.
            :
            */
            subquery.setResultColumns(subqueryRCL.copyListAndObjects());
            subqueryRCL.genVirtualColumnNodes(subquery, subquery.getResultColumns());
            resultColumns = subqueryRCL;

          that is, the result column list that was used when binding the order by list has now been moved up one level and been given an additional level of VCNs.

          I find I can avoid this problem by either:

          a) moving the binding of the order by list til after this step, or
          b) construct a new RCL for the FromSubquery and leave the old one alone.

          a) is the safer option; I don't know if b) will have any negative side effects, but will run the regressions to see if anything pops. A priori I would prefer b) since it seems a more direct solution.

          Show
          Dag H. Wanvik added a comment - The reason we see a VirtualColumnNode in the OrderByColumn is that FromSubquery changes the result column list of its subquery after we have bound the order by list, cf. this comment in the code: /* Now that we've bound the expressions in the subquery, we can propagate the subquery's RCL up to the FromSubquery. Get the subquery's RCL, assign shallow copy back to it and create new VirtualColumnNodes for the original's ResultColumn.expressions. : */ subquery.setResultColumns(subqueryRCL.copyListAndObjects()); subqueryRCL.genVirtualColumnNodes(subquery, subquery.getResultColumns()); resultColumns = subqueryRCL; that is, the result column list that was used when binding the order by list has now been moved up one level and been given an additional level of VCNs. I find I can avoid this problem by either: a) moving the binding of the order by list til after this step, or b) construct a new RCL for the FromSubquery and leave the old one alone. a) is the safer option; I don't know if b) will have any negative side effects, but will run the regressions to see if anything pops. A priori I would prefer b) since it seems a more direct solution.
          Hide
          Bryan Pendleton added a comment -

          Hi Dag,

          Did you investigate making the code in OrderByList.sortRequired smarter? It seems like it's trying
          to figure out whether we are ordering by an expression, or by a simple column reference, and the
          VirtualColumnNode confuses it and makes it go into the "sorting by an expression" scenario.

          If you add some code to OrderByList.sortRequired to check for the presence of a VirtualColumnNode,
          and then to fetch the column reference from the virtual colum's result column, does that help any?

          Show
          Bryan Pendleton added a comment - Hi Dag, Did you investigate making the code in OrderByList.sortRequired smarter? It seems like it's trying to figure out whether we are ordering by an expression, or by a simple column reference, and the VirtualColumnNode confuses it and makes it go into the "sorting by an expression" scenario. If you add some code to OrderByList.sortRequired to check for the presence of a VirtualColumnNode, and then to fetch the column reference from the virtual colum's result column, does that help any?
          Hide
          Dag H. Wanvik added a comment -

          Thanks for looking at this, Bryan! No, I didn't try to make sortRequired go through VirtualColumnNode, since
          I think its presence was an accident in this particular case, cf. the rewrite I mention above. Now, there may now (after adding order by in subqueries) be other use cases where going through the VCN would be valid, so I'll have a look.
          Even with the above suggested fix, I still need to solve the redundant sort incurred in this silly query:

          select * from (select i from t order by i) t order by i

          The outer order by doesn't yet get the fact the the subquery already has done the job. Maybe your idea can help solve this case. Btw, solution b) above did not set off any regression errors, so it seems safe.

          Show
          Dag H. Wanvik added a comment - Thanks for looking at this, Bryan! No, I didn't try to make sortRequired go through VirtualColumnNode, since I think its presence was an accident in this particular case, cf. the rewrite I mention above. Now, there may now (after adding order by in subqueries) be other use cases where going through the VCN would be valid, so I'll have a look. Even with the above suggested fix, I still need to solve the redundant sort incurred in this silly query: select * from (select i from t order by i) t order by i The outer order by doesn't yet get the fact the the subquery already has done the job. Maybe your idea can help solve this case. Btw, solution b) above did not set off any regression errors, so it seems safe.
          Hide
          Dag H. Wanvik added a comment - - edited

          Uploading derby-4397-sortavoidance-a, a patch which makes sort avoidance work in the case described above, using solution b, which passed regressions. A test case is added to verify this (OrderByAndOffsetFetchInSubqueries#testSelectSubqueriesSortAvoidance).

          It does not yet avoid sorting in the case,

          select * from (select i from t order by i) t order by i

          presumably because the outer order by doesn't make use of the fact that the subquery is already sorted on that column. Without the inner "order by", the subquery is flattened and no sorting is performed. Is the flattening a prerequisite for the optimizer to handle such cases?
          The query

          select * from (select i from t offset 0 rows) t order by i

          currently stops flattening from happening, and it also incurs an unneccecary sort (there is an index on i here)

          Show
          Dag H. Wanvik added a comment - - edited Uploading derby-4397-sortavoidance-a, a patch which makes sort avoidance work in the case described above, using solution b, which passed regressions. A test case is added to verify this (OrderByAndOffsetFetchInSubqueries#testSelectSubqueriesSortAvoidance). It does not yet avoid sorting in the case, select * from (select i from t order by i) t order by i presumably because the outer order by doesn't make use of the fact that the subquery is already sorted on that column. Without the inner "order by", the subquery is flattened and no sorting is performed. Is the flattening a prerequisite for the optimizer to handle such cases? The query select * from (select i from t offset 0 rows) t order by i currently stops flattening from happening, and it also incurs an unneccecary sort (there is an index on i here)
          Hide
          Dag H. Wanvik added a comment -

          Committed patch derby-4397-sortavoidance-a as svn 899588.

          Show
          Dag H. Wanvik added a comment - Committed patch derby-4397-sortavoidance-a as svn 899588.
          Hide
          Dag H. Wanvik added a comment -

          I have done some investigation regarding my question above "Is the
          flattening a prerequisite for the optimizer to handle such cases?".

          If anyone is familiar with how the optimizer works, I would appreciate
          any comments, since this is pretty much unknown code to me.

          It seems that information about ordering of result sets from
          subqueries is lost/not used by the outer query if the subquery can not
          be flattened. The following example query also gets an unnecessary
          sort:

          (A) select * from (select distinct i, (select count from sys.systables) as j from t) t order by i

          Here the nested "select count" in the select list stops flattening
          from happening, cf. SelectNode#flattenableInFromSubquery ca line 1379.
          So the tree structure is somewhat similar to what we had in the query

          (B) select * from (select i from t offset 0 rows) t order by i

          when optimization starts, so I thought it would be useful to look at
          its optimization (query A):

          The from subquery is optimized separately as far as I can tell (I will
          ignore the SELECT subquery, I only introduced it to prevent
          flattening).

          The DISTINCT is optimized away since there is an index on i (in
          optimization of the subquery). The ORDER BY in the outer query leads
          to a sort, though. From looking at the code this is to be expected, I
          think:

          The check done in SelectNode calls orderByList.getSortNeeded() which
          will return true unless orderByList.sortNotNeeded has been
          called. This call is only made by the optimizer if it has found a
          SORT_AVOIDANCE_PLAN. The criterion for finding a sort avoidance plan
          hinges on the call to requiredRowOrdering.sortRequired
          (i.e. OrderByList.sortRequired) in
          OptimizerImpl.getNextDecoratedPermutation ca line 1799. If
          sortRequired does not return RequiredRowOrdering.NOTHING_REQUIRED, a
          sort will be necessary. That check fails to detect that nothing is
          required since the referenced columns table number is not in the table
          map provided. I think this is since the column stems from the
          subquery.

          Note that if the an "order by i" is added to the inner query, this
          will be handled correctly and optimized away, but the outer ORDER BY
          still leads so sorting.

          In sum, it seems to me that the information about any row ordering in
          the unflattened subquery is not transmitted up to the optimization of
          the outer query. If this conclusion is correct, we would need a new
          mechanism to convey the missing information to the upper optimizer in
          the presence of unflattened "from subqueries".

          Show
          Dag H. Wanvik added a comment - I have done some investigation regarding my question above "Is the flattening a prerequisite for the optimizer to handle such cases?". If anyone is familiar with how the optimizer works, I would appreciate any comments, since this is pretty much unknown code to me. It seems that information about ordering of result sets from subqueries is lost/not used by the outer query if the subquery can not be flattened. The following example query also gets an unnecessary sort: (A) select * from (select distinct i, (select count from sys.systables) as j from t) t order by i Here the nested "select count " in the select list stops flattening from happening, cf. SelectNode#flattenableInFromSubquery ca line 1379. So the tree structure is somewhat similar to what we had in the query (B) select * from (select i from t offset 0 rows) t order by i when optimization starts, so I thought it would be useful to look at its optimization (query A): The from subquery is optimized separately as far as I can tell (I will ignore the SELECT subquery, I only introduced it to prevent flattening). The DISTINCT is optimized away since there is an index on i (in optimization of the subquery). The ORDER BY in the outer query leads to a sort, though. From looking at the code this is to be expected, I think: The check done in SelectNode calls orderByList.getSortNeeded() which will return true unless orderByList.sortNotNeeded has been called. This call is only made by the optimizer if it has found a SORT_AVOIDANCE_PLAN. The criterion for finding a sort avoidance plan hinges on the call to requiredRowOrdering.sortRequired (i.e. OrderByList.sortRequired) in OptimizerImpl.getNextDecoratedPermutation ca line 1799. If sortRequired does not return RequiredRowOrdering.NOTHING_REQUIRED, a sort will be necessary. That check fails to detect that nothing is required since the referenced columns table number is not in the table map provided. I think this is since the column stems from the subquery. Note that if the an "order by i" is added to the inner query, this will be handled correctly and optimized away, but the outer ORDER BY still leads so sorting. In sum, it seems to me that the information about any row ordering in the unflattened subquery is not transmitted up to the optimization of the outer query. If this conclusion is correct, we would need a new mechanism to convey the missing information to the upper optimizer in the presence of unflattened "from subqueries".
          Hide
          Dag H. Wanvik added a comment -

          Uploading a new revision of the specification (adds syntax changes for view creation).

          Show
          Dag H. Wanvik added a comment - Uploading a new revision of the specification (adds syntax changes for view creation).
          Hide
          Dag H. Wanvik added a comment -

          Resolving this issue; further optimization if deemed necessary later, will be handled in new a new issue.

          Show
          Dag H. Wanvik added a comment - Resolving this issue; further optimization if deemed necessary later, will be handled in new a new issue.

            People

            • Assignee:
              Dag H. Wanvik
              Reporter:
              Dag H. Wanvik
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development