Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 10.4.1.3
    • Component/s: SQL
    • Labels:
      None

      Description

      I suggest to implement the "null ordering" option for the "ORDER BY" clause:

      According to the SQL standard, 10.10 <sort specification list>:

      <null ordering> ::=
      NULLS FIRST

      NULLS LAST
      1. prototypeCodeNoTests.diff
        34 kB
        Bryan Pendleton
      2. prototypeCodeNoTests_v2.diff
        41 kB
        Bryan Pendleton
      3. prototypeCodeNoTests_v3.diff
        27 kB
        Bryan Pendleton
      4. nullOrderingPatch.diff
        43 kB
        Bryan Pendleton

        Issue Links

          Activity

          Hide
          bryanpendleton Bryan Pendleton added a comment -

          I read the DatabaseMetaData javadoc for a while, and I didn't see
          any reason to believe that our current implementation should
          change due to the NULLS FIRST/LAST support. The DatabaseMetaData
          javadoc doesn't specifically mention the impact of NULLS FIRST/LAST,
          but since the DatabaseMetaData methods are supposed to describe
          the behavior of the database system as a whole, and the NULLS
          FIRST/LAST is a query-specific behavior, it seems reasonable that
          the DatabaseMetadata methods should not take NULLS FIRST/LAST
          into account.

          So I think it's safe to re-resolve this issue.

          Show
          bryanpendleton Bryan Pendleton added a comment - I read the DatabaseMetaData javadoc for a while, and I didn't see any reason to believe that our current implementation should change due to the NULLS FIRST/LAST support. The DatabaseMetaData javadoc doesn't specifically mention the impact of NULLS FIRST/LAST, but since the DatabaseMetaData methods are supposed to describe the behavior of the database system as a whole, and the NULLS FIRST/LAST is a query-specific behavior, it seems reasonable that the DatabaseMetadata methods should not take NULLS FIRST/LAST into account. So I think it's safe to re-resolve this issue.
          Hide
          bryanpendleton Bryan Pendleton added a comment -

          It occurs to me that maybe I should have updated the
          DatabaseMetaData implementation, specifically the methods:

          • nullsAreSortedAtEnd()
          • nullsAreSortedAtStart()
          • nullsAreSortedHigh()
          • nullsAreSortedLow()

          I'm not totally sure what those methods are supposed to return,
          and it's quite possible that no changes are needed, but I
          thought that to be safe, I would re-open this issue and go
          examine the documentation for those methods to see if they
          need to be changed.

          Show
          bryanpendleton Bryan Pendleton added a comment - It occurs to me that maybe I should have updated the DatabaseMetaData implementation, specifically the methods: nullsAreSortedAtEnd() nullsAreSortedAtStart() nullsAreSortedHigh() nullsAreSortedLow() I'm not totally sure what those methods are supposed to return, and it's quite possible that no changes are needed, but I thought that to be safe, I would re-open this issue and go examine the documentation for those methods to see if they need to be changed.
          Hide
          bryanpendleton Bryan Pendleton added a comment -

          The implementation is intended to use indexes in the cases where indexes are currently used.

          That is, in the index, NULL values are always sorted high, so if the NULL FIRST/LAST setting
          is such that the index can be used to satisfy the ORDER BY, then it will be used. Otherwise,
          a sort will be required.

          It would be great if you can verify this independently. I added a test to the regression suite, but
          more tests are always helpful.

          Show
          bryanpendleton Bryan Pendleton added a comment - The implementation is intended to use indexes in the cases where indexes are currently used. That is, in the index, NULL values are always sorted high, so if the NULL FIRST/LAST setting is such that the index can be used to satisfy the ORDER BY, then it will be used. Otherwise, a sort will be required. It would be great if you can verify this independently. I added a test to the regression suite, but more tests are always helpful.
          Hide
          chdh@inventec.ch Christian d'Heureuse added a comment -

          Bryan, thanks for implementing NULLS FIRST/LAST.
          Does your current implementation use indexes for optimizing ORDER BYs?

          Show
          chdh@inventec.ch Christian d'Heureuse added a comment - Bryan, thanks for implementing NULLS FIRST/LAST. Does your current implementation use indexes for optimizing ORDER BYs?
          Hide
          bryanpendleton Bryan Pendleton added a comment -

          Committed nullOrderingPatch.diff to the trunk as revision 567314.

          Show
          bryanpendleton Bryan Pendleton added a comment - Committed nullOrderingPatch.diff to the trunk as revision 567314.
          Hide
          bryanpendleton Bryan Pendleton added a comment -

          I'm intending to submit nullOrderingPatch.diff fairly soon. If anyone is reviewing this patch and has feedback, please let me know.

          Show
          bryanpendleton Bryan Pendleton added a comment - I'm intending to submit nullOrderingPatch.diff fairly soon. If anyone is reviewing this patch and has feedback, please let me know.
          Hide
          bryanpendleton Bryan Pendleton added a comment -

          Thanks Thomas for the review and testing!

          Show
          bryanpendleton Bryan Pendleton added a comment - Thanks Thomas for the review and testing!
          Hide
          thomanie Thomas Nielsen added a comment -

          nullOrderingPatch.diff works as expected from what I can tell from manual testing.
          suitesAll show no new problems. Code looks nice and clean.

          Show
          thomanie Thomas Nielsen added a comment - nullOrderingPatch.diff works as expected from what I can tell from manual testing. suitesAll show no new problems. Code looks nice and clean.
          Hide
          bryanpendleton Bryan Pendleton added a comment -

          Attached is nullOrderingPatch.diff, a patch proposal. This patch builds on the prototype_v3 patch by:

          • adding code to OrderByList to ensure that ORDER BY triggers a sort when necessary
          • adding tests to orderby.sql and orderbyElimination.sql to demonstrate the behavior of NULLS FIRST/LAST
          • adding a few more comments to the code changes.

          I'd be most grateful if you could have a look at this patch proposal and tell me what you think.

          Show
          bryanpendleton Bryan Pendleton added a comment - Attached is nullOrderingPatch.diff, a patch proposal. This patch builds on the prototype_v3 patch by: adding code to OrderByList to ensure that ORDER BY triggers a sort when necessary adding tests to orderby.sql and orderbyElimination.sql to demonstrate the behavior of NULLS FIRST/LAST adding a few more comments to the code changes. I'd be most grateful if you could have a look at this patch proposal and tell me what you think.
          Hide
          bryanpendleton Bryan Pendleton added a comment -

          It seems that the Wisconsin diff is also present in the trunk, although I'm having trouble getting to http://dbtg.thresher.com to verify that other builds of the trunk are seeing it as well. At least in my build of the trunk, without any of my prototype changes, I see the diff.

          Show
          bryanpendleton Bryan Pendleton added a comment - It seems that the Wisconsin diff is also present in the trunk, although I'm having trouble getting to http://dbtg.thresher.com to verify that other builds of the trunk are seeing it as well. At least in my build of the trunk, without any of my prototype changes, I see the diff.
          Hide
          bryanpendleton Bryan Pendleton added a comment -

          Attached is prototypeCodeNoTests_v3.diff, updated in response to the various suggestions.

          This version 3 patch is much simpler than the previous two; Dan's ideas about sharing common code into DataType.java were a big win.

          My next steps will be to: add some new tests, investigate the sort avoidance issue, and track down a diff in the Wisconsin tests involving outer joins. All the other existing tests pass.

          Show
          bryanpendleton Bryan Pendleton added a comment - Attached is prototypeCodeNoTests_v3.diff, updated in response to the various suggestions. This version 3 patch is much simpler than the previous two; Dan's ideas about sharing common code into DataType.java were a big win. My next steps will be to: add some new tests, investigate the sort avoidance issue, and track down a diff in the Wisconsin tests involving outer joins. All the other existing tests pass.
          Hide
          chdh@inventec.ch Christian d'Heureuse added a comment -

          The Oracle documentation says:

          ASC | DESC Specify the ordering sequence (ascending or descending). ASC is the
          default.

          NULLS FIRST | NULLS LAST Specify whether returned rows containing nulls should
          appear first or last in the ordering sequence.
          NULLS LAST is the default for ascending order, and NULLS FIRST is the default for
          descending order.

          (The default for NULLS FIRST/LAST is the same as in Derby)

          Show
          chdh@inventec.ch Christian d'Heureuse added a comment - The Oracle documentation says: ASC | DESC Specify the ordering sequence (ascending or descending). ASC is the default. NULLS FIRST | NULLS LAST Specify whether returned rows containing nulls should appear first or last in the ordering sequence. NULLS LAST is the default for ascending order, and NULLS FIRST is the default for descending order. (The default for NULLS FIRST/LAST is the same as in Derby)
          Hide
          chdh@inventec.ch Christian d'Heureuse added a comment -

          Same result with Oracle 10G.

          Show
          chdh@inventec.ch Christian d'Heureuse added a comment - Same result with Oracle 10G.
          Hide
          chdh@inventec.ch Christian d'Heureuse added a comment -

          I have tested with H2 (http://www.h2database.com):

          create table temp1 (i int);
          insert into temp1 values (1);
          insert into temp1 values (2);
          insert into temp1 values (null);
          select * from temp1 order by i desc nulls first;

          Result:
          NULL
          2
          1

          Show
          chdh@inventec.ch Christian d'Heureuse added a comment - I have tested with H2 ( http://www.h2database.com): create table temp1 (i int); insert into temp1 values (1); insert into temp1 values (2); insert into temp1 values (null); select * from temp1 order by i desc nulls first; Result: NULL 2 1
          Hide
          chdh@inventec.ch Christian d'Heureuse added a comment -

          i think the SQL standard (section 10.10) is not clear on this.
          It defines NULLS FIRST/LAST with the comparison operator:
          ... "If NULLS FIRST is specified or implied, then PVi <comp op> QVi is considered to be True." ...

          Show
          chdh@inventec.ch Christian d'Heureuse added a comment - i think the SQL standard (section 10.10) is not clear on this. It defines NULLS FIRST/LAST with the comparison operator: ... "If NULLS FIRST is specified or implied, then PVi <comp op> QVi is considered to be True." ...
          Hide
          bryanpendleton Bryan Pendleton added a comment -

          Thanks very much for the comments and suggestions, Dan and Christian! I believe that I have the interpretation of NULLS FIRST/LAST wrong in the prototype code, and that Derby's current behavior is actually:

          NULLS LAST is the default for ascending order, and NULLS FIRST is the default for descending order.

          Section 10.10 of the SQL standard is the relevant section.

          I'll update the wiki to incorporate this revised understanding, then I'll have a fresh look at the code, and in the process I'll incorporate the other suggestions. I'm particularly keen on trying to avoid having to touch so many files; Dan your comments on the compare functions are very helpful.

          Show
          bryanpendleton Bryan Pendleton added a comment - Thanks very much for the comments and suggestions, Dan and Christian! I believe that I have the interpretation of NULLS FIRST/LAST wrong in the prototype code, and that Derby's current behavior is actually: NULLS LAST is the default for ascending order, and NULLS FIRST is the default for descending order. Section 10.10 of the SQL standard is the relevant section. I'll update the wiki to incorporate this revised understanding, then I'll have a fresh look at the code, and in the process I'll incorporate the other suggestions. I'm particularly keen on trying to avoid having to touch so many files; Dan your comments on the compare functions are very helpful.
          Hide
          djd Daniel John Debrunner added a comment -

          I see from the wiki that the 'Interaction of NULLS FIRST/LAST and ASC/DESC' is still not resolved. I think this needs to be resolved before any code is committed.

          If 'NULLS FIRST' really means first regardless of ASC/DESC then:

          • The wiki says that NULLS LAST is the default for Derby, but actually it will be NULLS FIRST if DESC and NULLS LAST if ASC (or is it the other way around
          • the code would need to be clear so that in the data type the variables & concept are described as nulls low or nulls high. Continuing the nulls first/last naming into DVD where it would actually have a different meaning would lead to confusion in the code. The SQL language layer would continue to have naming indicating first/last concept.
          Show
          djd Daniel John Debrunner added a comment - I see from the wiki that the 'Interaction of NULLS FIRST/LAST and ASC/DESC' is still not resolved. I think this needs to be resolved before any code is committed. If 'NULLS FIRST' really means first regardless of ASC/DESC then: The wiki says that NULLS LAST is the default for Derby, but actually it will be NULLS FIRST if DESC and NULLS LAST if ASC (or is it the other way around the code would need to be clear so that in the data type the variables & concept are described as nulls low or nulls high. Continuing the nulls first/last naming into DVD where it would actually have a different meaning would lead to confusion in the code. The SQL language layer would continue to have naming indicating first/last concept.
          Hide
          djd Daniel John Debrunner added a comment -

          Why does this issue block DERBY-581?

          Show
          djd Daniel John Debrunner added a comment - Why does this issue block DERBY-581 ?
          Hide
          chdh@inventec.ch Christian d'Heureuse added a comment -

          What happens when an index exists with NULLS LAST and a query uses NULLS FIRST in the ORDER BY clause? Will this be optimized, if the query only references fields that are included in the index?

          Show
          chdh@inventec.ch Christian d'Heureuse added a comment - What happens when an index exists with NULLS LAST and a query uses NULLS FIRST in the ORDER BY clause? Will this be optimized, if the query only references fields that are included in the index?
          Hide
          djd Daniel John Debrunner added a comment -

          Some thoughts on the patch (v2):

          Does the patch work with sort avoidance? If a scan is against an index and there is an ORDER BY clause that matches the index then we avoid the order by.
          This optimization has to be turned off if NULLS FIRST is set (unless the columns are not-nullable).

          The passing of an extra argument to the compare() method, and for the regular compare now calling the new compare method will affect performance. I don't know how significant it will be, but within an index btree lookup compare will be called a lot, and now we are passing in an argument which will always be false. A possible alternative that would not affect existing performance, would be to have a single nulls first compare in DataType like (code only, comments you added should be kept):

          public final int compare(DataValueDescriptor other, boolean nullsFirst) throws StandardException
          {
          if (this.isNull() || other.isNull())

          { if (!isNull()) return nullsFirst ? 1 : -1; if (!other.isNull()) return nullsFirst ? -1 : 1; return 0; // both null }

          return compare(other);
          }

          Moving the single argument compare() method in each data type to the parent class DataType would avoid repeated identical code in each class. However with the above comment it would mean this comment doesn't apply, but it's a idea to remember for future changes, if possible push duplicate code up the hierarchy.

          For IndexColumnOrder, one approach would be to have two constructors, one with the old signature which would default nulls first to false, and then the new one. This would mean that code that isn't changing doesn't need to be modified by the patch.

          Show
          djd Daniel John Debrunner added a comment - Some thoughts on the patch (v2): Does the patch work with sort avoidance? If a scan is against an index and there is an ORDER BY clause that matches the index then we avoid the order by. This optimization has to be turned off if NULLS FIRST is set (unless the columns are not-nullable). The passing of an extra argument to the compare() method, and for the regular compare now calling the new compare method will affect performance. I don't know how significant it will be, but within an index btree lookup compare will be called a lot, and now we are passing in an argument which will always be false. A possible alternative that would not affect existing performance, would be to have a single nulls first compare in DataType like (code only, comments you added should be kept): public final int compare(DataValueDescriptor other, boolean nullsFirst) throws StandardException { if (this.isNull() || other.isNull()) { if (!isNull()) return nullsFirst ? 1 : -1; if (!other.isNull()) return nullsFirst ? -1 : 1; return 0; // both null } return compare(other); } Moving the single argument compare() method in each data type to the parent class DataType would avoid repeated identical code in each class. However with the above comment it would mean this comment doesn't apply, but it's a idea to remember for future changes, if possible push duplicate code up the hierarchy. For IndexColumnOrder, one approach would be to have two constructors, one with the old signature which would default nulls first to false, and then the new one. This would mean that code that isn't changing doesn't need to be modified by the patch.
          Hide
          bryanpendleton Bryan Pendleton added a comment -

          The first patch did not handle collation sequences properly. The prototypeCodeNoTests_v2.diff addresses that problem. Still just a work in progress.

          Show
          bryanpendleton Bryan Pendleton added a comment - The first patch did not handle collation sequences properly. The prototypeCodeNoTests_v2.diff addresses that problem. Still just a work in progress.
          Hide
          bryanpendleton Bryan Pendleton added a comment -

          Attached is 'prototype_code_no_tests.diff', which is NOT for commit. I've started experimenting with some changes, and I wanted to attach them so I wouldn't lose them, but I've still got more work to do.

          This first attempt modifies the compare(DVD) api so that the caller can pass in a boolean indicating whether NULLS should sort FIRST or LAST, and modifies the sorter to use that new api. The patch also has changes to the grammar and the compiler to parse the user's <null ordering> specification and track it in the data structures.

          Show
          bryanpendleton Bryan Pendleton added a comment - Attached is 'prototype_code_no_tests.diff', which is NOT for commit. I've started experimenting with some changes, and I wanted to attach them so I wouldn't lose them, but I've still got more work to do. This first attempt modifies the compare(DVD) api so that the caller can pass in a boolean indicating whether NULLS should sort FIRST or LAST, and modifies the sorter to use that new api. The patch also has changes to the grammar and the compiler to parse the user's <null ordering> specification and track it in the data structures.
          Hide
          bryanpendleton Bryan Pendleton added a comment -

          I've started placing some notes about this issue on the wiki: http://wiki.apache.org/db-derby/OLAPNullOrdering

          Show
          bryanpendleton Bryan Pendleton added a comment - I've started placing some notes about this issue on the wiki: http://wiki.apache.org/db-derby/OLAPNullOrdering

            People

            • Assignee:
              bryanpendleton Bryan Pendleton
              Reporter:
              chdh@inventec.ch Christian d'Heureuse
            • Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development