Derby
  1. Derby
  2. DERBY-4413

INSERT from SELECT DISTINCT gives assertFailure (sane), or NPE (insane) in presence of generated columns

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.5.1.1, 10.5.2.0, 10.5.3.0
    • Fix Version/s: 10.5.3.1, 10.6.1.0
    • Component/s: SQL
    • Labels:
      None
    • Urgency:
      Normal
    • Issue & fix info:
      Repro attached

      Description

      When a generated column is present in a table, an INSERT DISTINCT fail:
      Repro:

      create table t(i integer,
      j integer not null generated always as (i*66));
      insert into t values 1,2;
      insert into t select distinct i from t;

      In an insane build we see this assertFailure:

      ij version 10.5
      ij> connect 'jdbc:derby:wombat2;create=true';
      ij> create table t(i integer,
      j integer not null generated always as (i*66));
      0 rows inserted/updated/deleted
      ij> insert into t values 1,2;
      2 rows inserted/updated/deleted
      ij> insert into t select distinct i from t;
      ERROR XJ001: Java exception: 'ASSERT FAILED col[1] is null: org.apache.derby.shared.common.sanity.AssertFailure'.
      java.sql.SQLException: Java exception: 'ASSERT FAILED col[1] is null: org.apache.derby.shared.common.sanity.AssertFailure'.
      at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:95)
      at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Util.java:87)
      at org.apache.derby.impl.jdbc.Util.javaException(Util.java:244)
      at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(TransactionResourceImpl.java:403)
      at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(TransactionResourceImpl.java:346)
      at org.apache.derby.impl.jdbc.EmbedConnection.handleException(EmbedConnection.java:2201)
      at org.apache.derby.impl.jdbc.ConnectionChild.handleException(ConnectionChild.java:81)
      at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1323)
      at org.apache.derby.impl.jdbc.EmbedStatement.execute(EmbedStatement.java:625)
      at org.apache.derby.impl.jdbc.EmbedStatement.execute(EmbedStatement.java:555)
      at org.apache.derby.impl.tools.ij.ij.executeImmediate(ij.java:329)
      at org.apache.derby.impl.tools.ij.utilMain.doCatch(utilMain.java:505)
      at org.apache.derby.impl.tools.ij.utilMain.runScriptGuts(utilMain.java:347)
      at org.apache.derby.impl.tools.ij.utilMain.go(utilMain.java:245)
      at org.apache.derby.impl.tools.ij.Main.go(Main.java:217)
      at org.apache.derby.impl.tools.ij.Main.mainCore(Main.java:184)
      at org.apache.derby.impl.tools.ij.Main.main(Main.java:75)
      at org.apache.derby.tools.ij.main(ij.java:59)
      at org.apache.derby.iapi.tools.run.main(run.java:53)
      Caused by: java.sql.SQLException: Java exception: 'ASSERT FAILED col[1] is null: org.apache.derby.shared.common.sanity.AssertFailure'.
      at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:45)
      at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(SQLExceptionFactory40.java:119)
      at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:70)
      ... 18 more
      Caused by: org.apache.derby.shared.common.sanity.AssertFailure: ASSERT FAILED col[1] is null
      at org.apache.derby.shared.common.sanity.SanityManager.THROWASSERT(SanityManager.java:162)
      at org.apache.derby.shared.common.sanity.SanityManager.THROWASSERT(SanityManager.java:147)
      at org.apache.derby.impl.store.access.sort.MergeSort.checkColumnTypes(MergeSort.java:458)
      at org.apache.derby.impl.store.access.sort.MergeInserter.insert(MergeInserter.java:98)
      at org.apache.derby.impl.sql.execute.SortResultSet.loadSorter(SortResultSet.java:317)
      at org.apache.derby.impl.sql.execute.SortResultSet.openCore(SortResultSet.java:268)
      at org.apache.derby.impl.sql.execute.NormalizeResultSet.openCore(NormalizeResultSet.java:139)
      at org.apache.derby.impl.sql.execute.InsertResultSet.open(InsertResultSet.java:415)
      at org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(GenericPreparedStatement.java:416)
      at org.apache.derby.impl.sql.GenericPreparedStatement.execute(GenericPreparedStatement.java:297)
      at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1235)
      ... 11 more

      1. derby-4413.diff
        4 kB
        Dag H. Wanvik
      2. derby-4413.stat
        0.3 kB
        Dag H. Wanvik
      3. derby-4413-2.diff
        5 kB
        Dag H. Wanvik
      4. derby-4413-2.stat
        0.3 kB
        Dag H. Wanvik
      5. derby-4413-rollback.diff
        3 kB
        Dag H. Wanvik
      6. derby-4413-rollback.stat
        0.1 kB
        Dag H. Wanvik

        Issue Links

          Activity

          Hide
          Mike Matrigali added a comment -

          backported all changes for this fix from trunk to 10.5, resolving as fixed and reassigning the orignal owner.

          Show
          Mike Matrigali added a comment - backported all changes for this fix from trunk to 10.5, resolving as fixed and reassigning the orignal owner.
          Hide
          Mike Matrigali added a comment -

          merged change #885659 from trunk to 10.5.

          m105_jdk16:99>svn commit
          Sending .
          Sending java/engine/org/apache/derby/impl/sql/execute/BasicSortObserver.java
          Sending java/engine/org/apache/derby/impl/store/access/sort/MergeSort.java
          Transmitting file data ..
          Committed revision 960783.

          Show
          Mike Matrigali added a comment - merged change #885659 from trunk to 10.5. m105_jdk16:99>svn commit Sending . Sending java/engine/org/apache/derby/impl/sql/execute/BasicSortObserver.java Sending java/engine/org/apache/derby/impl/store/access/sort/MergeSort.java Transmitting file data .. Committed revision 960783.
          Hide
          Mike Matrigali added a comment -

          committed backport of #829410 from trunk to 10.5 branch.

          m105_jdk16:33>svn commit
          Sending .
          Sending java/engine/org/apache/derby/impl/sql/GenericColumnDescriptor.java
          Sending java/engine/org/apache/derby/impl/sql/execute/BasicSortObserver.java
          Sending java/engine/org/apache/derby/impl/store/access/sort/MergeSort.java
          Sending java/testing/org/apache/derbyTesting/functionTests/tests/lang/GeneratedColumnsTest.java
          Transmitting file data ....
          Committed revision 960285.

          Show
          Mike Matrigali added a comment - committed backport of #829410 from trunk to 10.5 branch. m105_jdk16:33>svn commit Sending . Sending java/engine/org/apache/derby/impl/sql/GenericColumnDescriptor.java Sending java/engine/org/apache/derby/impl/sql/execute/BasicSortObserver.java Sending java/engine/org/apache/derby/impl/store/access/sort/MergeSort.java Sending java/testing/org/apache/derbyTesting/functionTests/tests/lang/GeneratedColumnsTest.java Transmitting file data .... Committed revision 960285.
          Hide
          Mike Matrigali added a comment -

          working on backporting this issue to 10.5. DERBY-4419, DERBY-4413, DERBY-4425, and DERBY-4442 all seem related. I am going to apply and checkin the backported changes to these issues in order. I think the changes necessary are:
          DERBY-4413 #829410
          DERBY-4419 #831304
          DERBY-4425 #831319
          DERBY-4442 #885421
          DERBY-4413 #885659
          DERBY-4442 #888311

          Show
          Mike Matrigali added a comment - working on backporting this issue to 10.5. DERBY-4419 , DERBY-4413 , DERBY-4425 , and DERBY-4442 all seem related. I am going to apply and checkin the backported changes to these issues in order. I think the changes necessary are: DERBY-4413 #829410 DERBY-4419 #831304 DERBY-4425 #831319 DERBY-4442 #885421 DERBY-4413 #885659 DERBY-4442 #888311
          Hide
          Lily Wei added a comment -

          Reopen to 10.5 back port

          Show
          Lily Wei added a comment - Reopen to 10.5 back port
          Hide
          Dag H. Wanvik added a comment -

          Regressions passed, committed the rollback patch as svn 885659, closing.

          Show
          Dag H. Wanvik added a comment - Regressions passed, committed the rollback patch as svn 885659, closing.
          Hide
          Dag H. Wanvik added a comment -

          After DERBY-4442 went in, the exception to the ASSERT check we made in the fix for this issue, could be rolled back, so as to provide a better internal consistency check. Attaching
          derby-4413-rollback to that end. I keep the extras test cases, though. Running regressions.

          Show
          Dag H. Wanvik added a comment - After DERBY-4442 went in, the exception to the ASSERT check we made in the fix for this issue, could be rolled back, so as to provide a better internal consistency check. Attaching derby-4413-rollback to that end. I keep the extras test cases, though. Running regressions.
          Hide
          Dag H. Wanvik added a comment -

          Committed as svn 829410, resolving. We may want to backport this fix.

          Show
          Dag H. Wanvik added a comment - Committed as svn 829410, resolving. We may want to backport this fix.
          Hide
          Dag H. Wanvik added a comment -

          Regressions ran ok with both sane and insane jars.

          Show
          Dag H. Wanvik added a comment - Regressions ran ok with both sane and insane jars.
          Hide
          Dag H. Wanvik added a comment -

          Uploading derby-4413-2, which checks that's we accept a null in the sanity check only if the column is not part of the sort key. It also adds comments as suggested. Re-running regressions with insane and sane modes (since code path varies).

          Show
          Dag H. Wanvik added a comment - Uploading derby-4413-2, which checks that's we accept a null in the sanity check only if the column is not part of the sort key. It also adds comments as suggested. Re-running regressions with insane and sane modes (since code path varies).
          Hide
          Dag H. Wanvik added a comment -

          Dag said above: "I think it would better (and more
          correct if I read the standard correctly) to compute those values when
          the row is being inserted (as is currently done for the (general)
          generated columns ("late")."

          I checked the standard again on this point, and the upshot is that
          defaults and default generated valued for columns not supplied should
          be constructed when the row is inserted:

          Section 14.11 (SQL 2008, part 2), General rules 5,6 and 7:

          5) QT is effectively evaluated before insertion of any rows into T.
          6) Let Q be the result of evaluating QT.
          7) For each row R of Q:

          a) A candidate row of T is effectively created in which the value
          of each column is its default value, as specified in the General
          Rules of Subclause 11.5, "default clause>". The
          candidate row consists of every column of T.
          :

          c) For each object column in the candidate row, let Ci be the
          object column identified by the i-th <column name> in the
          <insert column list> and let SVi be the i-th value of R.

          d) For every Ci for which one of the following conditions is true:

          i) Ci is not marked as unassigned and no underlying column of Ci
          is a self-referencing column.
          :

          the General Rules of Subclause 9.2, "Store assignment", are applied
          with Ci and SVi as TARGET and SOURCE, respectively. Ci is no longer
          marked as unassigned.

          My takeaway from this is that the defaults should be constructed as
          each row is picked form the fully evaluated result set (QA). With SQL
          2008, that result set may be sorted before insertion, e.g. as

          CREATE TABLE t (i int generated always as identity, j int)
          INSERT INTO t (j) SELECT j from sometab ORDER BY j

          With the present implementation, the auto-increment happens "early"
          (before the sort) and would not be monotonically increasing in the
          result set to be inserted, cf. Bryan's issue in DERBY-4. For general
          generated columns, it happens at the right time ("late"). So, for
          "DERBY-4397 Allow ORDER BY in subqueries", we would need to move the
          auto-increment to "late" phase also to make it correct (not just a
          performance issue).

          Show
          Dag H. Wanvik added a comment - Dag said above: "I think it would better (and more correct if I read the standard correctly) to compute those values when the row is being inserted (as is currently done for the (general) generated columns ("late")." I checked the standard again on this point, and the upshot is that defaults and default generated valued for columns not supplied should be constructed when the row is inserted: Section 14.11 (SQL 2008, part 2), General rules 5,6 and 7: 5) QT is effectively evaluated before insertion of any rows into T. 6) Let Q be the result of evaluating QT. 7) For each row R of Q: a) A candidate row of T is effectively created in which the value of each column is its default value, as specified in the General Rules of Subclause 11.5, "default clause>". The candidate row consists of every column of T. : c) For each object column in the candidate row, let Ci be the object column identified by the i-th <column name> in the <insert column list> and let SVi be the i-th value of R. d) For every Ci for which one of the following conditions is true: i) Ci is not marked as unassigned and no underlying column of Ci is a self-referencing column. : the General Rules of Subclause 9.2, "Store assignment", are applied with Ci and SVi as TARGET and SOURCE, respectively. Ci is no longer marked as unassigned. My takeaway from this is that the defaults should be constructed as each row is picked form the fully evaluated result set (QA). With SQL 2008, that result set may be sorted before insertion, e.g. as CREATE TABLE t (i int generated always as identity, j int) INSERT INTO t (j) SELECT j from sometab ORDER BY j With the present implementation, the auto-increment happens "early" (before the sort) and would not be monotonically increasing in the result set to be inserted, cf. Bryan's issue in DERBY-4 . For general generated columns, it happens at the right time ("late"). So, for " DERBY-4397 Allow ORDER BY in subqueries", we would need to move the auto-increment to "late" phase also to make it correct (not just a performance issue).
          Hide
          Dag H. Wanvik added a comment -

          Looking at the syntax for insert statement in SQL 2008, I conclude that
          an ORDER BY can not be attached to the INSERT INTO .. VALUES.

          Section 14.11 <insert statement>:

          <insert statement> ::=
          INSERT INTO <insertion target> <insert columns and source>

          <insert columns and source> ::=
          <from subquery>

          <from constructor>
          <from default>

          <from constructor> ::=
          [ <left paren> <insert column list> <right paren> ]
          [ <override clause> ]
          <contextually typed table value constructor>

          <contextually typed table value constructor> ::=
          VALUES <contextually typed row value expression list>

          This would not allow and ORDER BY. However, looking at the <from
          subquery> alternative:

          <from subquery> ::=
          [ <left paren> <insert column list> <right paren> ]
          [ <override clause> ]
          <query expression>

          <query expression> ::=
          [ <with clause> ] <query expression body>
          [ <order by clause> ] [ <result offset clause> ] [ <fetch first clause> ]

          we see that here we could supply an ORDER BY clause if we the VALUES
          clause can be derived from <query expression body>, and indeed it can:

          <query expression body> ::=
          <query term>

          <query expression body> UNION [ ALL DISTINCT ]
          [ <corresponding spec> ] <query term>
          <query expression body> EXCEPT [ ALL DISTINCT ]
          [ <corresponding spec> ] <query term>

          <query term> ::=
          <query primary>

          <query term> INTERSECT [ ALL DISTINCT ]
          [ <corresponding spec> ] <query primary>

          <query primary> ::=
          <simple table>

          <left paren> <query expression body>
          [ <order by clause> ] [ <result offset clause> ] [ <fetch first clause> ] <right paren>

          <simple table> ::=
          <query specification>

          <table value constructor>
          <explicit table>

          <table value constructor> ::=
          VALUES <row value expression list>

          So how to resolve this?
          Finally, looking closely, I found this (14.11, Syntactic Rule 17):

          17) A <query expression> simply contained in a <from subquery> shall
          not be a <table value constructor>.

          NOTE 391 - This rule removes a syntactic ambiguity; otherwise, "VALUES (1)"
          could be parsed either as

          <insert columns and source> ::=
          <from subquery> ::=
          <query expression> ::=
          <table value constructor> ::=
          VALUES (1)

          or

          <insert columns and source> ::=
          <from constructor> ::=
          <contextually typed table value constructor> ::=
          VALUES (1)

          So, in conclusion, the <contextually typed table value constructor>
          derivation will prevail, and ORDER BY can not be attached to VALUES here.

          Show
          Dag H. Wanvik added a comment - Looking at the syntax for insert statement in SQL 2008, I conclude that an ORDER BY can not be attached to the INSERT INTO .. VALUES. Section 14.11 <insert statement>: <insert statement> ::= INSERT INTO <insertion target> <insert columns and source> <insert columns and source> ::= <from subquery> <from constructor> <from default> <from constructor> ::= [ <left paren> <insert column list> <right paren> ] [ <override clause> ] <contextually typed table value constructor> <contextually typed table value constructor> ::= VALUES <contextually typed row value expression list> This would not allow and ORDER BY. However, looking at the <from subquery> alternative: <from subquery> ::= [ <left paren> <insert column list> <right paren> ] [ <override clause> ] <query expression> <query expression> ::= [ <with clause> ] <query expression body> [ <order by clause> ] [ <result offset clause> ] [ <fetch first clause> ] we see that here we could supply an ORDER BY clause if we the VALUES clause can be derived from <query expression body>, and indeed it can: <query expression body> ::= <query term> <query expression body> UNION [ ALL DISTINCT ] [ <corresponding spec> ] <query term> <query expression body> EXCEPT [ ALL DISTINCT ] [ <corresponding spec> ] <query term> <query term> ::= <query primary> <query term> INTERSECT [ ALL DISTINCT ] [ <corresponding spec> ] <query primary> <query primary> ::= <simple table> <left paren> <query expression body> [ <order by clause> ] [ <result offset clause> ] [ <fetch first clause> ] <right paren> <simple table> ::= <query specification> <table value constructor> <explicit table> <table value constructor> ::= VALUES <row value expression list> So how to resolve this? Finally, looking closely, I found this (14.11, Syntactic Rule 17): 17) A <query expression> simply contained in a <from subquery> shall not be a <table value constructor>. NOTE 391 - This rule removes a syntactic ambiguity; otherwise, "VALUES (1)" could be parsed either as <insert columns and source> ::= <from subquery> ::= <query expression> ::= <table value constructor> ::= VALUES (1) or <insert columns and source> ::= <from constructor> ::= <contextually typed table value constructor> ::= VALUES (1) So, in conclusion, the <contextually typed table value constructor> derivation will prevail, and ORDER BY can not be attached to VALUES here.
          Hide
          Knut Anders Hatlen added a comment -

          Just for the record... Before DERBY-1644, enhanceRCLForInsert() did insert a ProjectRestrictNode on top of the existing node in order to add the generated columns, but only for set operations.

          Show
          Knut Anders Hatlen added a comment - Just for the record... Before DERBY-1644 , enhanceRCLForInsert() did insert a ProjectRestrictNode on top of the existing node in order to add the generated columns, but only for set operations.
          Hide
          Dag H. Wanvik added a comment -

          Thanks, Knut. Interesting corner case; I'll see if I can get any help from the standard on that one

          Show
          Dag H. Wanvik added a comment - Thanks, Knut. Interesting corner case; I'll see if I can get any help from the standard on that one
          Hide
          Knut Anders Hatlen added a comment -

          Good point about the VALUES (DEFAULT), Dag. I didn't think about that case. I wonder what that would mean if we have an ORDER BY on a generated column, like in the query below (the syntax not allowed yet - DERBY-4):

          INSERT INTO T(X) VALUES (DEFAULT), (DEFAULT) ORDER BY 1

          I'd assume that the columns that we sort on must be generated before the rows are sent through the sorter. But then it would break the rule that the value must be computed when the row is being inserted.

          I agree that the current patch is a good incremental improvement, and my question about adding an intermediary node was out of curiosity and not a request for a different approach.

          Show
          Knut Anders Hatlen added a comment - Good point about the VALUES (DEFAULT), Dag. I didn't think about that case. I wonder what that would mean if we have an ORDER BY on a generated column, like in the query below (the syntax not allowed yet - DERBY-4 ): INSERT INTO T(X) VALUES (DEFAULT), (DEFAULT) ORDER BY 1 I'd assume that the columns that we sort on must be generated before the rows are sent through the sorter. But then it would break the rule that the value must be computed when the row is being inserted. I agree that the current patch is a good incremental improvement, and my question about adding an intermediary node was out of curiosity and not a request for a different approach.
          Hide
          Dag H. Wanvik added a comment -

          Thanks for the comments, guys. Yes, I will add the assert that the
          column is not part of the key; good improvement, and some more
          comments.

          Currently, columns that are auto generated (BY IDENTITY) are
          constructed early (general generated columns have only the stubs) and
          will be along in the sort. That early approach also led to the the
          effect that Bryan saw in DERBY-4. I think it would better (and more
          correct if I read the standard correctly) to compute those values when
          the row is being inserted (as is currently done for the (general)
          generated columns ("late"). I'd like to look at that, but I think this
          patch could go in an an incremental improvement first.

          Interesting about the similarity with DERBY-4419 and DERBY-4420,
          Knut. As for adding an intermediary node, I guess that could work for
          SELECT, but would it work for a VALUES clause that specified
          "DEFAULT"? It seems we would have to remove explicit DEFAULT column
          values from the RCL to get uniform treatment then? (I.e. a result set
          tree devoid of any generated columns, and maybe of plain DEFAULT
          columns as well?) However, if we managed to do this, it might solve
          4419/4420 too. As for cheaper, sorting the stubs for (general)
          generated columns is probably cheaper than sorting the BY IDENTITY
          values, since those columns are not empty. Instantiating By IDENTITY
          (and possible plain DEFAULT) late rather than early would save (more)
          cycles, too.

          Show
          Dag H. Wanvik added a comment - Thanks for the comments, guys. Yes, I will add the assert that the column is not part of the key; good improvement, and some more comments. Currently, columns that are auto generated (BY IDENTITY) are constructed early (general generated columns have only the stubs) and will be along in the sort. That early approach also led to the the effect that Bryan saw in DERBY-4 . I think it would better (and more correct if I read the standard correctly) to compute those values when the row is being inserted (as is currently done for the (general) generated columns ("late"). I'd like to look at that, but I think this patch could go in an an incremental improvement first. Interesting about the similarity with DERBY-4419 and DERBY-4420 , Knut. As for adding an intermediary node, I guess that could work for SELECT, but would it work for a VALUES clause that specified "DEFAULT"? It seems we would have to remove explicit DEFAULT column values from the RCL to get uniform treatment then? (I.e. a result set tree devoid of any generated columns, and maybe of plain DEFAULT columns as well?) However, if we managed to do this, it might solve 4419/4420 too. As for cheaper, sorting the stubs for (general) generated columns is probably cheaper than sorting the BY IDENTITY values, since those columns are not empty. Instantiating By IDENTITY (and possible plain DEFAULT) late rather than early would save (more) cycles, too.
          Hide
          Rick Hillegas added a comment -

          Thanks for the tidy patch, Dag. I agree with Knut that it might be good to assert that the null value is not part of the sort key. Thanks.

          Show
          Rick Hillegas added a comment - Thanks for the tidy patch, Dag. I agree with Knut that it might be good to assert that the null value is not part of the sort key. Thanks.
          Hide
          Knut Anders Hatlen added a comment -

          You have probably already considered this, but do you think it would work to make enhanceAndCheckForAutoIncrement() insert a node between the InsertNode and its child, and modify the RCL of the inserted node instead of the RCL of the child? I was thinking that it might make the code more robust against bugs such as this one, DERBY-4419 and DERBY-4420. And it also sounds (slightly) cheaper to add the stubs for the generated columns after the result has been sorted, as there will be less data to process for the sorter.

          Show
          Knut Anders Hatlen added a comment - You have probably already considered this, but do you think it would work to make enhanceAndCheckForAutoIncrement() insert a node between the InsertNode and its child, and modify the RCL of the inserted node instead of the RCL of the child? I was thinking that it might make the code more robust against bugs such as this one, DERBY-4419 and DERBY-4420 . And it also sounds (slightly) cheaper to add the stubs for the generated columns after the result has been sorted, as there will be less data to process for the sorter.
          Hide
          Knut Anders Hatlen added a comment -

          Thanks for the fix, Dag. Perhaps it would be worth adding one-line comments in BasicSortObserver and MergeSort that state when the column is expected to be null, and perhaps also that the column cannot be part of the sort key. Instead of removing the assert in MergeSort when the column is null, we may want to replace it with a check for whether the column is part of the sort key, and fail if it is.

          Show
          Knut Anders Hatlen added a comment - Thanks for the fix, Dag. Perhaps it would be worth adding one-line comments in BasicSortObserver and MergeSort that state when the column is expected to be null, and perhaps also that the column cannot be part of the sort key. Instead of removing the assert in MergeSort when the column is null, we may want to replace it with a check for whether the column is part of the sort key, and fail if it is.
          Hide
          Dag H. Wanvik added a comment -

          For the record, I found this issue when trying to warm up Bryan's patch for DERBY-4 to allow
          INSERT with ORDER BY on the result set as part of investigating DERBY-4397 (ORDER BY in subqueries).
          Since DISTINCT in this issue requires a sort, the cases hit the same problem.

          Show
          Dag H. Wanvik added a comment - For the record, I found this issue when trying to warm up Bryan's patch for DERBY-4 to allow INSERT with ORDER BY on the result set as part of investigating DERBY-4397 (ORDER BY in subqueries). Since DISTINCT in this issue requires a sort, the cases hit the same problem.
          Hide
          Dag H. Wanvik added a comment - - edited

          Attaching a patch which makes DISTINCT work. It turns out that
          untyped nulls (as used in the query tree for generated columns), do
          not generate any code, so the column will be empty while the result
          set to be INSERTed is being constructed (at execution time). This is
          OK since the value will be filled in later (by InsertResultSet's call
          evaluateGenerationClauses).

          This small patch makes sorted result sets accept (ignore, really)
          empty columns, which is OK I think, since they are not part of the
          sort key anyway. Added a new test case to GeneratedColumnsTest.

          Running regressions.

          Show
          Dag H. Wanvik added a comment - - edited Attaching a patch which makes DISTINCT work. It turns out that untyped nulls (as used in the query tree for generated columns), do not generate any code, so the column will be empty while the result set to be INSERTed is being constructed (at execution time). This is OK since the value will be filled in later (by InsertResultSet's call evaluateGenerationClauses). This small patch makes sorted result sets accept (ignore, really) empty columns, which is OK I think, since they are not part of the sort key anyway. Added a new test case to GeneratedColumnsTest. Running regressions.
          Hide
          Dag H. Wanvik added a comment - - edited

          If I use a default value rather than a GENERATED ALWAYS AS, it works
          as expected.

          It seems the generated values are computed deferredly,
          i.e. when row is to be inserted (InsertResultSet), whereas default
          values are generated along with the rest of the columns. This causes a
          problem for the sorter in the former case.

          Default values are handled via this logic:

          Cf. InsertNode#bindStatement -> InsertNode#enhanceAndCheckForAutoincrement ->
          ResultSetNode#enhanceRCLForInsert -> genNewRCForInsert (ca line 1149)

          whereas generated columns are explicitly deferred in that code
          and handled here:
          InsertNode#bindStatement -> parseAndBindGenerationClauses

          Since generated columns are not bound down through the "enhanced"
          result set's RCL, code is not generated for the "null" constants left
          by enhanceRCLForInsert, and the sorter crashes.

          Note that the RC has an expression containing an untyped NULL,
          so it seems the column is not well-formed when the sorter sees it.

          org.apache.derby.impl.sql.compile.ResultColumn@9dca26
          exposedName: null
          name: J
          tableName: null
          isDefaultColumn: false
          wasDefaultColumn: false
          isNameGenerated: false
          sourceTableName: null
          type: INTEGER NOT NULL
          columnDescriptor: columnName: J
          columnPosition: 2
          columnType: INTEGER NOT NULL
          columnDefault: null
          uuid: eb0f4097-0124-6cc9-1ba1-000003db0968
          defaultUUID: c3350098-0124-6cc9-1ba1-000003db0968

          isGenerated: false
          isGeneratedForUnmatchedColumnInInsert: true
          isGroupingColumn: false
          isReferenced: true
          isRedundant: false
          virtualColumnId: 2
          resultSetNumber: -1
          dataTypeServices: INTEGER NOT NULL
          expression:
          org.apache.derby.impl.sql.compile.UntypedNullConstantNode@1429cb2
          value: null
          dataTypeServices: null

          Show
          Dag H. Wanvik added a comment - - edited If I use a default value rather than a GENERATED ALWAYS AS, it works as expected. It seems the generated values are computed deferredly, i.e. when row is to be inserted (InsertResultSet), whereas default values are generated along with the rest of the columns. This causes a problem for the sorter in the former case. Default values are handled via this logic: Cf. InsertNode#bindStatement -> InsertNode#enhanceAndCheckForAutoincrement -> ResultSetNode#enhanceRCLForInsert -> genNewRCForInsert (ca line 1149) whereas generated columns are explicitly deferred in that code and handled here: InsertNode#bindStatement -> parseAndBindGenerationClauses Since generated columns are not bound down through the "enhanced" result set's RCL, code is not generated for the "null" constants left by enhanceRCLForInsert, and the sorter crashes. Note that the RC has an expression containing an untyped NULL, so it seems the column is not well-formed when the sorter sees it. org.apache.derby.impl.sql.compile.ResultColumn@9dca26 exposedName: null name: J tableName: null isDefaultColumn: false wasDefaultColumn: false isNameGenerated: false sourceTableName: null type: INTEGER NOT NULL columnDescriptor: columnName: J columnPosition: 2 columnType: INTEGER NOT NULL columnDefault: null uuid: eb0f4097-0124-6cc9-1ba1-000003db0968 defaultUUID: c3350098-0124-6cc9-1ba1-000003db0968 isGenerated: false isGeneratedForUnmatchedColumnInInsert: true isGroupingColumn: false isReferenced: true isRedundant: false virtualColumnId: 2 resultSetNumber: -1 dataTypeServices: INTEGER NOT NULL expression: org.apache.derby.impl.sql.compile.UntypedNullConstantNode@1429cb2 value: null dataTypeServices: null

            People

            • Assignee:
              Dag H. Wanvik
              Reporter:
              Dag H. Wanvik
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development