Uploaded image for project: 'Derby'
  1. Derby
  2. DERBY-3310

ASSERT in MergeSort.checkColumnTypes() disallow legal type conversions

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 10.4.1.3
    • Fix Version/s: 10.4.1.3, 10.5.1.1
    • Component/s: SQL
    • Labels:
      None
    • Issue & fix info:
      Patch Available
    • Bug behavior facts:
      Regression

      Description

      The following code

      CREATE TABLE U (SNAME VARCHAR(32000), TNAME VARCHAR(32000), C1 BIGINT);
      – This triggers an ASSERT (because 2 is INTEGER and not BIGINT)
      INSERT INTO U(SNAME, TNAME, C1) SELECT DISTINCT SCHEMANAME, TABLENAME, 2
      FROM SYS.SYSTABLES T JOIN SYS.SYSSCHEMAS S ON T.SCHEMAID = S.SCHEMAID;

      gives

      ERROR XJ001: Java exception: 'ASSERT FAILED col1.getClass() (class org.apache.derby.iapi.types.SQLInteger) expected to be the same as col2.getClass() (class org.apache.derby.iapi.types.SQLLongint): org.apache.derby.shared.common.sanity.AssertFailure'.

      1. derby-3310_try1_diff.txt
        4 kB
        Kathey Marsden
      2. derby3310_rsn_cleanup_1.txt
        6 kB
        Daniel John Debrunner
      3. derby-3310_remove_genNormalizeResultSetNode_diff.txt
        8 kB
        Kathey Marsden
      4. derby-3310_diff.txt
        5 kB
        Kathey Marsden
      5. derby-3310_diff.txt
        4 kB
        Kathey Marsden
      6. d3310_writeup_1.html
        99 kB
        A B
      7. cast-repro.sql
        0.6 kB
        Dyre Tjeldvoll

        Issue Links

          Activity

          Hide
          dyret Dyre Tjeldvoll added a comment -

          Attaching a repro in the form of an ij-script

          Show
          dyret Dyre Tjeldvoll added a comment - Attaching a repro in the form of an ij-script
          Hide
          dyret Dyre Tjeldvoll added a comment -

          This issue was first seen while trying to reproduce DERBY-3221 in sane mode.

          Show
          dyret Dyre Tjeldvoll added a comment - This issue was first seen while trying to reproduce DERBY-3221 in sane mode.
          Hide
          dyret Dyre Tjeldvoll added a comment -

          Marking this as a regression since it worked fine in rev 540921, and possibly later.

          Show
          dyret Dyre Tjeldvoll added a comment - Marking this as a regression since it worked fine in rev 540921, and possibly later.
          Hide
          dyret Dyre Tjeldvoll added a comment -

          Seems like this was introduced by
          ------------------------------------------------------------------------
          r554012 | djd | 2007-07-06 21:40:07 +0200 (Fri, 06 Jul 2007) | 4 lines

          DERBY-2775 (partial) Clarify some type handling for ResultColumn and its expression by making
          callers explictly get the expression's type from the expression, rather than having
          a method in ResultColumn that doesn't do what it indicates it did.
          Make VirtualColumnNode's type just refect its source node.
          ------------------------------------------------------------------------

          Show
          dyret Dyre Tjeldvoll added a comment - Seems like this was introduced by ------------------------------------------------------------------------ r554012 | djd | 2007-07-06 21:40:07 +0200 (Fri, 06 Jul 2007) | 4 lines DERBY-2775 (partial) Clarify some type handling for ResultColumn and its expression by making callers explictly get the expression's type from the expression, rather than having a method in ResultColumn that doesn't do what it indicates it did. Make VirtualColumnNode's type just refect its source node. ------------------------------------------------------------------------
          Hide
          kmarsden Kathey Marsden added a comment -

          I wonder if the assertion in MergeSort should just be removed or is there a better assertion that could be used to make sure the types are assignable?
          From MergeSort ...
          if (col1.getClass() != col2.getClass())

          { SanityManager.THROWASSERT( "col1.getClass() (" + col1.getClass() + ") expected to be the same as col2.getClass() (" + col2.getClass() + ")"); }
          Show
          kmarsden Kathey Marsden added a comment - I wonder if the assertion in MergeSort should just be removed or is there a better assertion that could be used to make sure the types are assignable? From MergeSort ... if (col1.getClass() != col2.getClass()) { SanityManager.THROWASSERT( "col1.getClass() (" + col1.getClass() + ") expected to be the same as col2.getClass() (" + col2.getClass() + ")"); }
          Hide
          djd Daniel John Debrunner added a comment -

          Before changing the assert it would be good to understand why the types are different. I thought that if the types were different then a cast node is required to make the types the same when assigning one value to another.

          Show
          djd Daniel John Debrunner added a comment - Before changing the assert it would be good to understand why the types are different. I thought that if the types were different then a cast node is required to make the types the same when assigning one value to another.
          Hide
          kmarsden Kathey Marsden added a comment -

          I looked a little at the change that introduced the assertion, and narrowed it to this change in VirtualColumnNode.java, specifically the override of the setType method to set the type of the source column.
          http://svn.apache.org/viewvc/db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/VirtualColumnNode.java?p2=%2Fdb%2Fderby%2Fcode%2Ftrunk%2Fjava%2Fengine%2Forg%2Fapache%2Fderby%2Fimpl%2Fsql%2Fcompile%2FVirtualColumnNode.java&p1=%2Fdb%2Fderby%2Fcode%2Ftrunk%2Fjava%2Fengine%2Forg%2Fapache%2Fderby%2Fimpl%2Fsql%2Fcompile%2FVirtualColumnNode.java&r1=554012&r2=554011&view=diff&pathrev=554012

          Before this change the types when passing through the assertion code were both SQLInteger, now one is SQLInteger and the other is SQLLongInt.

          I am not clear yet whether both should be SQLInteger or SQLLongInt at this point.

          Show
          kmarsden Kathey Marsden added a comment - I looked a little at the change that introduced the assertion, and narrowed it to this change in VirtualColumnNode.java, specifically the override of the setType method to set the type of the source column. http://svn.apache.org/viewvc/db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/VirtualColumnNode.java?p2=%2Fdb%2Fderby%2Fcode%2Ftrunk%2Fjava%2Fengine%2Forg%2Fapache%2Fderby%2Fimpl%2Fsql%2Fcompile%2FVirtualColumnNode.java&p1=%2Fdb%2Fderby%2Fcode%2Ftrunk%2Fjava%2Fengine%2Forg%2Fapache%2Fderby%2Fimpl%2Fsql%2Fcompile%2FVirtualColumnNode.java&r1=554012&r2=554011&view=diff&pathrev=554012 Before this change the types when passing through the assertion code were both SQLInteger, now one is SQLInteger and the other is SQLLongInt. I am not clear yet whether both should be SQLInteger or SQLLongInt at this point.
          Hide
          kmarsden Kathey Marsden added a comment -

          So things seem to work ok without the assert because SQLLongint.setValue()
          calls getLong() on the value it is being set from so we get a compatible value.

          protected void setFrom(DataValueDescriptor theValue) throws StandardException

          { setValue(theValue.getLong()); }

          Here is the trace where it is called.

          Thread [main] (Suspended (breakpoint at line 94 in SQLInteger))
          SQLInteger.getLong() line: 94
          SQLLongint.setFrom(DataValueDescriptor) line: 440
          SQLLongint(DataType).setValue(DataValueDescriptor) line: 490
          SQLLongint(DataType).normalize(DataTypeDescriptor, DataValueDescriptor) line: 662
          DataTypeDescriptor.normalize(DataValueDescriptor, DataValueDescriptor) line: 469
          NormalizeResultSet.normalizeRow(ExecRow) line: 330
          NormalizeResultSet.getNextRowCore() line: 189
          InsertResultSet(DMLWriteResultSet).getNextRowCore(NoPutResultSet) line: 127
          InsertResultSet.open() line: 496
          GenericPreparedStatement.execute(Activation, boolean, long) line: 370
          EmbedStatement40(EmbedStatement).executeStatement(Activation, boolean, boolean) line: 1203
          EmbedStatement40(EmbedStatement).execute(String, boolean, boolean, int, int[], String[]) line: 596
          EmbedStatement40(EmbedStatement).execute(String) line: 528
          ij.executeImmediate(String) line: 330
          utilMain14(utilMain).doCatch(String) line: 522
          utilMain14(utilMain).runScriptGuts() line: 364
          utilMain14(utilMain).go(LocalizedInput[], LocalizedOutput, Properties) line: 262
          Main14(Main).go(LocalizedInput, LocalizedOutput, Properties) line: 215
          Main.mainCore(String[], Main) line: 181
          Main14.main(String[]) line: 56
          ij.main(String[]) line: 71

          Is a cast still needed?

          Show
          kmarsden Kathey Marsden added a comment - So things seem to work ok without the assert because SQLLongint.setValue() calls getLong() on the value it is being set from so we get a compatible value. protected void setFrom(DataValueDescriptor theValue) throws StandardException { setValue(theValue.getLong()); } Here is the trace where it is called. Thread [main] (Suspended (breakpoint at line 94 in SQLInteger)) SQLInteger.getLong() line: 94 SQLLongint.setFrom(DataValueDescriptor) line: 440 SQLLongint(DataType).setValue(DataValueDescriptor) line: 490 SQLLongint(DataType).normalize(DataTypeDescriptor, DataValueDescriptor) line: 662 DataTypeDescriptor.normalize(DataValueDescriptor, DataValueDescriptor) line: 469 NormalizeResultSet.normalizeRow(ExecRow) line: 330 NormalizeResultSet.getNextRowCore() line: 189 InsertResultSet(DMLWriteResultSet).getNextRowCore(NoPutResultSet) line: 127 InsertResultSet.open() line: 496 GenericPreparedStatement.execute(Activation, boolean, long) line: 370 EmbedStatement40(EmbedStatement).executeStatement(Activation, boolean, boolean) line: 1203 EmbedStatement40(EmbedStatement).execute(String, boolean, boolean, int, int[], String[]) line: 596 EmbedStatement40(EmbedStatement).execute(String) line: 528 ij.executeImmediate(String) line: 330 utilMain14(utilMain).doCatch(String) line: 522 utilMain14(utilMain).runScriptGuts() line: 364 utilMain14(utilMain).go(LocalizedInput[], LocalizedOutput, Properties) line: 262 Main14(Main).go(LocalizedInput, LocalizedOutput, Properties) line: 215 Main.mainCore(String[], Main) line: 181 Main14.main(String[]) line: 56 ij.main(String[]) line: 71 Is a cast still needed?
          Hide
          djd Daniel John Debrunner added a comment - - edited

          In this specific case a cast is not needed, but I'm thinking more about the general case, so if it had been a case where conversion was needed
          would the behaviour be correct?

          So:
          is the assert wrong because the normalize node is doing the correct thing (e.g it knows INT -> IBIGNT is a safe conversion, no cast node needed),
          or is the assert correct and the normalize node is not putting in conversion code
          or is the normalize node and assert not related (my guess)

          We are trying to insert a value of type A into a column of type B, but there is the additional distinct step.
          so the query is doing something like:

          sort of (rows with value of type A) ==> insert into table with column of type B

          The assert is saying that at some point some types must match during a merge sort, which seems reasonable, but this would
          be in the step before the normalize (which is the insert step). So why are the types different there?

          Show
          djd Daniel John Debrunner added a comment - - edited In this specific case a cast is not needed, but I'm thinking more about the general case, so if it had been a case where conversion was needed would the behaviour be correct? So: is the assert wrong because the normalize node is doing the correct thing (e.g it knows INT -> IBIGNT is a safe conversion, no cast node needed), or is the assert correct and the normalize node is not putting in conversion code or is the normalize node and assert not related (my guess) We are trying to insert a value of type A into a column of type B, but there is the additional distinct step. so the query is doing something like: sort of (rows with value of type A) ==> insert into table with column of type B The assert is saying that at some point some types must match during a merge sort, which seems reasonable, but this would be in the step before the normalize (which is the insert step). So why are the types different there?
          Hide
          kmarsden Kathey Marsden added a comment -

          So for our specific case we have
          sort of (rows with value of type INTEGER) ==> insert into table with column of type BIGINT

          So it makes sense to me that at the sort phase, both the row and the "template" should be INTEGER as was the case prior to the change for DERBY-2775. Now the template column type is BIGINT and the row column type is INTEGER.
          Could you explain what the template is in this context and whether you think it should be INTEGER or BIGINT at this point in the processing?

          Show
          kmarsden Kathey Marsden added a comment - So for our specific case we have sort of (rows with value of type INTEGER) ==> insert into table with column of type BIGINT So it makes sense to me that at the sort phase, both the row and the "template" should be INTEGER as was the case prior to the change for DERBY-2775 . Now the template column type is BIGINT and the row column type is INTEGER. Could you explain what the template is in this context and whether you think it should be INTEGER or BIGINT at this point in the processing?
          Hide
          djd Daniel John Debrunner added a comment -

          What do you mean by template, which field in which class?

          A template row for the store is a row that contains the correct DataValueDescriptor objects for the desired type of the row,
          so what it should be depends on context.

          E.g. if this is true:

          sort of (rows with value of type INTEGER) ==> insert into table with column of type BIGINT

          then the template for the sort would have a SQLInteger, a template for the table corresponding to the insert would have an SQLLongint.

          However, I don't know enough about the details of language to know if the sort and insert are treated as logically separate items or if they
          are merged in some way.

          Show
          djd Daniel John Debrunner added a comment - What do you mean by template, which field in which class? A template row for the store is a row that contains the correct DataValueDescriptor objects for the desired type of the row, so what it should be depends on context. E.g. if this is true: sort of (rows with value of type INTEGER) ==> insert into table with column of type BIGINT then the template for the sort would have a SQLInteger, a template for the table corresponding to the insert would have an SQLLongint. However, I don't know enough about the details of language to know if the sort and insert are treated as logically separate items or if they are merged in some way.
          Hide
          kmarsden Kathey Marsden added a comment -

          I was referring to the template field in the MergeSort class.

          If "the template for the sort would have a SQLInteger," then I take it our pre DERBY-2775 behavior was
          correct that the template was a SQLInteger and not a SQLLongInt.

          Is the change for VirtualColumnNode getting its type from its source node flawed in some way?
          http://svn.apache.org/viewvc/db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/VirtualColumnNode.java?p2=%2Fdb%2Fderby%2Fcode%2Ftrunk%2Fjava%2Fengine%2Forg%2Fapache%2Fderby%2Fimpl%2Fsql%2Fcompile%2FVirtualColumnNode.java&p1=%2Fdb%2Fderby%2Fcode%2Ftrunk%2Fjava%2Fengine%2Forg%2Fapache%2Fderby%2Fimpl%2Fsql%2Fcompile%2FVirtualColumnNode.java&r1=554012&r2=554011&view=diff&pathrev=554012

          Show
          kmarsden Kathey Marsden added a comment - I was referring to the template field in the MergeSort class. If "the template for the sort would have a SQLInteger," then I take it our pre DERBY-2775 behavior was correct that the template was a SQLInteger and not a SQLLongInt. Is the change for VirtualColumnNode getting its type from its source node flawed in some way? http://svn.apache.org/viewvc/db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/VirtualColumnNode.java?p2=%2Fdb%2Fderby%2Fcode%2Ftrunk%2Fjava%2Fengine%2Forg%2Fapache%2Fderby%2Fimpl%2Fsql%2Fcompile%2FVirtualColumnNode.java&p1=%2Fdb%2Fderby%2Fcode%2Ftrunk%2Fjava%2Fengine%2Forg%2Fapache%2Fderby%2Fimpl%2Fsql%2Fcompile%2FVirtualColumnNode.java&r1=554012&r2=554011&view=diff&pathrev=554012
          Hide
          mikem Mike Matrigali added a comment -

          I think the assert code is correctly testing for what it assumes to be true - ie. that class types of the sort template will exactly match subsequent rows fed to the sort. It may work without the assert in this case but I am not sure that will always be true.
          Basically the sort interface requires a "template" of what rows will look like when fed into the sort. So column[N] of each row fed into the sort should be the exact same type as template[N].

          One could build a sorter that compared different types in the same column but I didn't think Derby currently required that.

          Show
          mikem Mike Matrigali added a comment - I think the assert code is correctly testing for what it assumes to be true - ie. that class types of the sort template will exactly match subsequent rows fed to the sort. It may work without the assert in this case but I am not sure that will always be true. Basically the sort interface requires a "template" of what rows will look like when fed into the sort. So column [N] of each row fed into the sort should be the exact same type as template [N] . One could build a sorter that compared different types in the same column but I didn't think Derby currently required that.
          Hide
          kmarsden Kathey Marsden added a comment -

          I think then we get back to my question of whether both template and row data should be SQLInteger or both should be SQLLongint at the time of the sort.

          BTW this is the trace from the point where we now change the type of the sourceColumn from INTEGER NOT NULL to BIGINT.
          I think I need to better understand the VirtualColumnNode change and why it was made.
          Thread [main] (Suspended (breakpoint at line 284 in VirtualColumnNode))
          VirtualColumnNode.setType(DataTypeDescriptor) line: 284
          ResultColumnList.copyTypesAndLengthsToSource(ResultColumnList) line: 1645
          InsertNode.bindStatement() line: 426
          GenericStatement.prepMinion(LanguageConnectionContext, boolean, Object[], SchemaDescriptor, boolean) line: 314
          GenericStatement.prepare(LanguageConnectionContext, boolean) line: 88
          GenericLanguageConnectionContext.prepareInternalStatement(SchemaDescriptor, String, boolean, boolean) line: 768
          EmbedStatement40(EmbedStatement).execute(String, boolean, boolean, int, int[], String[]) line: 607
          EmbedStatement40(EmbedStatement).execute(String) line: 556
          ij.executeImmediate(String) line: 330
          utilMain.doCatch(String) line: 508
          utilMain.runScriptGuts() line: 350
          utilMain.go(LocalizedInput[], LocalizedOutput, Properties) line: 248
          Main.go(LocalizedInput, LocalizedOutput, Properties) line: 215
          Main.mainCore(String[], Main) line: 181
          Main.main(String[]) line: 73
          ij.main(String[]) line: 59

          Show
          kmarsden Kathey Marsden added a comment - I think then we get back to my question of whether both template and row data should be SQLInteger or both should be SQLLongint at the time of the sort. BTW this is the trace from the point where we now change the type of the sourceColumn from INTEGER NOT NULL to BIGINT. I think I need to better understand the VirtualColumnNode change and why it was made. Thread [main] (Suspended (breakpoint at line 284 in VirtualColumnNode)) VirtualColumnNode.setType(DataTypeDescriptor) line: 284 ResultColumnList.copyTypesAndLengthsToSource(ResultColumnList) line: 1645 InsertNode.bindStatement() line: 426 GenericStatement.prepMinion(LanguageConnectionContext, boolean, Object[], SchemaDescriptor, boolean) line: 314 GenericStatement.prepare(LanguageConnectionContext, boolean) line: 88 GenericLanguageConnectionContext.prepareInternalStatement(SchemaDescriptor, String, boolean, boolean) line: 768 EmbedStatement40(EmbedStatement).execute(String, boolean, boolean, int, int[], String[]) line: 607 EmbedStatement40(EmbedStatement).execute(String) line: 556 ij.executeImmediate(String) line: 330 utilMain.doCatch(String) line: 508 utilMain.runScriptGuts() line: 350 utilMain.go(LocalizedInput[], LocalizedOutput, Properties) line: 248 Main.go(LocalizedInput, LocalizedOutput, Properties) line: 215 Main.mainCore(String[], Main) line: 181 Main.main(String[]) line: 73 ij.main(String[]) line: 59
          Hide
          kmarsden Kathey Marsden added a comment -

          I was hoping to pick up this issue, but must admit I am quite stuck. I am really not sure at the time of the sort whether both template and row data should be SQLInteger or SQLLongInt. If someone could explain which and why, I might better understand my destination and be able to make some progress on the issue. Before the DERBY-2775 both were SQLInteger but I don't know that that was correct. It may have been wrong for both values. Any help or hints would be greatly appreciated.

          Thanks
          Kathey

          Show
          kmarsden Kathey Marsden added a comment - I was hoping to pick up this issue, but must admit I am quite stuck. I am really not sure at the time of the sort whether both template and row data should be SQLInteger or SQLLongInt. If someone could explain which and why, I might better understand my destination and be able to make some progress on the issue. Before the DERBY-2775 both were SQLInteger but I don't know that that was correct. It may have been wrong for both values. Any help or hints would be greatly appreciated. Thanks Kathey
          Hide
          bryanpendleton Bryan Pendleton added a comment -

          Hi Kathey,

          Beware, you're getting a WAG here. But my opinion is that
          SQLInteger is the correct datatype, and SQLLongInt is incorrect.

          I'm guessing that the overall query tree looks something like:

          • InsertNode
          • NormalizeResultSetNode
          • SelectNode / ProjectRestrictNode

          It seems to me that the data being sorted is the SELECT data,
          and that data uses the constant 2, and so that data should have
          been sorted as simple integers.

          Then, the NormalizeResultSetNode should have generated
          a CAST operation to convert the integer to a biginit when the
          data is retrieved from the sorter.

          And, the VirtualColumnNodes used by the InsertNode should
          have pointed to the values returned by the NormalizeResultSetNode,
          so it seems correct that the VCN instance that the InsertNode
          references should be working with a SQLLongInt.

          So I'd suggest investigating the NormalizeResultSetNode, and
          whether it is generating the proper buffering CAST operations.

          Show
          bryanpendleton Bryan Pendleton added a comment - Hi Kathey, Beware, you're getting a WAG here. But my opinion is that SQLInteger is the correct datatype, and SQLLongInt is incorrect. I'm guessing that the overall query tree looks something like: InsertNode NormalizeResultSetNode SelectNode / ProjectRestrictNode It seems to me that the data being sorted is the SELECT data, and that data uses the constant 2, and so that data should have been sorted as simple integers. Then, the NormalizeResultSetNode should have generated a CAST operation to convert the integer to a biginit when the data is retrieved from the sorter. And, the VirtualColumnNodes used by the InsertNode should have pointed to the values returned by the NormalizeResultSetNode, so it seems correct that the VCN instance that the InsertNode references should be working with a SQLLongInt. So I'd suggest investigating the NormalizeResultSetNode, and whether it is generating the proper buffering CAST operations.
          Hide
          kmarsden Kathey Marsden added a comment -

          VirtualColumnNode has this comment
          /* A VirtualColumnNode contains a pointer to the immediate child result

          • that is materializing the virtual column and the ResultColumn
          • that represents that materialization.
            */
            private ResultSetNode sourceResultSet;
            private ResultColumn sourceColumn;

          The DERBY-2775 change overrode the setType from ValueNode:
          public void setType(DataTypeDescriptor dataTypeServices) throws StandardException

          { this.dataTypeServices = dataTypeServices; }

          with

          public void setType(DataTypeDescriptor dtd) throws StandardException

          { sourceColumn.setType(dtd); }

          Doesn't this end up incorrectly end up affecting the child result, changing it from
          SQLInteger to SQLLongInt?

          Show
          kmarsden Kathey Marsden added a comment - VirtualColumnNode has this comment /* A VirtualColumnNode contains a pointer to the immediate child result that is materializing the virtual column and the ResultColumn that represents that materialization. */ private ResultSetNode sourceResultSet; private ResultColumn sourceColumn; The DERBY-2775 change overrode the setType from ValueNode: public void setType(DataTypeDescriptor dataTypeServices) throws StandardException { this.dataTypeServices = dataTypeServices; } with public void setType(DataTypeDescriptor dtd) throws StandardException { sourceColumn.setType(dtd); } Doesn't this end up incorrectly end up affecting the child result, changing it from SQLInteger to SQLLongInt?
          Hide
          bryanpendleton Bryan Pendleton added a comment -

          If the "immediate child result" of the InsertNode is a Normalize node,
          then I think the VCN instances for the columns in the InsertNode
          should have sourceResultSet pointing to a NormalizeResultSet,
          and sourceColumn pointing to a column in the NRS's ResultColumnList.

          In that case, it seems correct for that sourceColumn to be of type longint,
          because it represents the result of the normalization.

          That is:

          • the PRN should sort values with integer type, and feed them up to the NRSN
          • the NRS should invoke normalizeRow() to convert the int to a long
          • the InsertNode should pull long values from the NRS and insert them
            into the target table.

          However, it looks like maybe this was the intent of the design, but
          never completed? Here's a snip from a comment in NormalizeResultSet.java:

          • Normalize a row. For now, this means calling constructors through
          • the type services to normalize a type to itself. For example,
          • if you're putting a char(30) value into a char(15) column, it
          • calls a SQLChar constructor with the char(30) value, and the
          • constructor truncates the value and makes sure that no non-blank
          • characters are truncated.
            *
          • In the future, this mechanism will be extended to do type conversions
          • as well. I didn't implement type conversions yet because it looks
          • like a lot of work, and we needed char and varchar right away.

          As I said before, though, I'm just guessing. It would be good to have some more
          eyes and thoughts on how this is supposed to behave.

          Show
          bryanpendleton Bryan Pendleton added a comment - If the "immediate child result" of the InsertNode is a Normalize node, then I think the VCN instances for the columns in the InsertNode should have sourceResultSet pointing to a NormalizeResultSet, and sourceColumn pointing to a column in the NRS's ResultColumnList. In that case, it seems correct for that sourceColumn to be of type longint, because it represents the result of the normalization. That is: the PRN should sort values with integer type, and feed them up to the NRSN the NRS should invoke normalizeRow() to convert the int to a long the InsertNode should pull long values from the NRS and insert them into the target table. However, it looks like maybe this was the intent of the design, but never completed? Here's a snip from a comment in NormalizeResultSet.java: Normalize a row. For now, this means calling constructors through the type services to normalize a type to itself. For example, if you're putting a char(30) value into a char(15) column, it calls a SQLChar constructor with the char(30) value, and the constructor truncates the value and makes sure that no non-blank characters are truncated. * In the future, this mechanism will be extended to do type conversions as well. I didn't implement type conversions yet because it looks like a lot of work, and we needed char and varchar right away. As I said before, though, I'm just guessing. It would be good to have some more eyes and thoughts on how this is supposed to behave.
          Hide
          kmarsden Kathey Marsden added a comment -

          Thank you so much Bryan for your help with this issue.
          I am a bit of a novice in this area so forgive if my questions
          are basic. You said:

          • the PRN should sort values with integer type, and feed them up to the NRSN
          • the NRS should invoke normalizeRow() to convert the int to a long
          • the InsertNode should pull long values from the NRS and insert them
            into the target table.

          and also said in an earlier comment.
          >So I'd suggest investigating the NormalizeResultSetNode, and
          whether it is generating the proper buffering CAST operations.

          If the sort happens before normalization, how could CAST operations in
          NormalizeResultSetNode have an effect on the sort?

          Show
          kmarsden Kathey Marsden added a comment - Thank you so much Bryan for your help with this issue. I am a bit of a novice in this area so forgive if my questions are basic. You said: the PRN should sort values with integer type, and feed them up to the NRSN the NRS should invoke normalizeRow() to convert the int to a long the InsertNode should pull long values from the NRS and insert them into the target table. and also said in an earlier comment. >So I'd suggest investigating the NormalizeResultSetNode, and whether it is generating the proper buffering CAST operations. If the sort happens before normalization, how could CAST operations in NormalizeResultSetNode have an effect on the sort?
          Hide
          bryanpendleton Bryan Pendleton added a comment -

          By itself, CAST operations in NormalizeResultSet would not fix
          the sort problem you are seeing, I agree.

          My comments are with respect to your earlier question:
          > I am really not sure at the time of the sort whether both template and row data
          > should be SQLInteger or SQLLongInt. If someone could explain which and why

          I think I am trying to propose a two-part theory:
          1) at the time of the sort, both template and row data should be SQLInteger.
          2) Since the Insert requires a SQLLongInteger, I think the conversion from
          Int to LongInt should occur in NormalizeResultSet, rather than in the sort.

          So working on NormalizeResultSet by itself would be only part of the solution,
          it still would also be necessary to change things so that the sort doesn't see
          the type mismatch.

          Does that make more sense?

          thanks,

          bryan

          Show
          bryanpendleton Bryan Pendleton added a comment - By itself, CAST operations in NormalizeResultSet would not fix the sort problem you are seeing, I agree. My comments are with respect to your earlier question: > I am really not sure at the time of the sort whether both template and row data > should be SQLInteger or SQLLongInt. If someone could explain which and why I think I am trying to propose a two-part theory: 1) at the time of the sort, both template and row data should be SQLInteger. 2) Since the Insert requires a SQLLongInteger, I think the conversion from Int to LongInt should occur in NormalizeResultSet, rather than in the sort. So working on NormalizeResultSet by itself would be only part of the solution, it still would also be necessary to change things so that the sort doesn't see the type mismatch. Does that make more sense? thanks, bryan
          Hide
          kmarsden Kathey Marsden added a comment -

          Thanks Bryan, that makes sense to me. My primary focus
          is to change the sort so it doesn't see the type mismatch.

          From what you say, the change to VirtualColumnNode
          http://svn.apache.org/viewvc/db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/VirtualColumnNode.java?p2=%2Fdb%2Fderby%2Fcode%2Ftrunk%2Fjava%2Fengine%2Forg%2Fapache%2Fderby%2Fimpl%2Fsql%2Fcompile%2FVirtualColumnNode.java&p1=%2Fdb%2Fderby%2Fcode%2Ftrunk%2Fjava%2Fengine%2Forg%2Fapache%2Fderby%2Fimpl%2Fsql%2Fcompile%2FVirtualColumnNode.java&r1=554012&r2=554011&view=diff&pathrev=554012
          should not have affected the sort phase, but clearly it did.

          I think it would help a lot to understand why this change was made. Dan could you elaborate?

          Show
          kmarsden Kathey Marsden added a comment - Thanks Bryan, that makes sense to me. My primary focus is to change the sort so it doesn't see the type mismatch. From what you say, the change to VirtualColumnNode http://svn.apache.org/viewvc/db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/VirtualColumnNode.java?p2=%2Fdb%2Fderby%2Fcode%2Ftrunk%2Fjava%2Fengine%2Forg%2Fapache%2Fderby%2Fimpl%2Fsql%2Fcompile%2FVirtualColumnNode.java&p1=%2Fdb%2Fderby%2Fcode%2Ftrunk%2Fjava%2Fengine%2Forg%2Fapache%2Fderby%2Fimpl%2Fsql%2Fcompile%2FVirtualColumnNode.java&r1=554012&r2=554011&view=diff&pathrev=554012 should not have affected the sort phase, but clearly it did. I think it would help a lot to understand why this change was made. Dan could you elaborate?
          Hide
          knutanders Knut Anders Hatlen added a comment -

          Perhaps this modified repro better illustrates that the sort should
          use SQLInteger and not SQLLongint in the original repro:

          ij> create table d3310 (x bigint);
          0 rows inserted/updated/deleted
          ij> insert into d3310 select distinct * from (values 2.0, 2.1, 2.2) v;
          ERROR XJ001: Java exception: 'ASSERT FAILED col1.getClass() (class org.apache.derby.iapi.types.SQLDecimal) expected to be the same as col2.getClass() (class org.apache.derby.iapi.types.SQLLongint): org.apache.derby.shared.common.sanity.AssertFailure'.

          If the cast to BIGINT happened in or before the sort, the above insert
          statement might end up inserting a single row instead of three rows,
          since the values would no longer be distinct.

          Show
          knutanders Knut Anders Hatlen added a comment - Perhaps this modified repro better illustrates that the sort should use SQLInteger and not SQLLongint in the original repro: ij> create table d3310 (x bigint); 0 rows inserted/updated/deleted ij> insert into d3310 select distinct * from (values 2.0, 2.1, 2.2) v; ERROR XJ001: Java exception: 'ASSERT FAILED col1.getClass() (class org.apache.derby.iapi.types.SQLDecimal) expected to be the same as col2.getClass() (class org.apache.derby.iapi.types.SQLLongint): org.apache.derby.shared.common.sanity.AssertFailure'. If the cast to BIGINT happened in or before the sort, the above insert statement might end up inserting a single row instead of three rows, since the values would no longer be distinct.
          Hide
          djd Daniel John Debrunner added a comment -

          For the rationale to the change to VirtualColumnNode (554012)

          From the description of VirtualColumnNode it is a representation of a base column from a view or a table in another part of a query.
          If this is the case then the type of the VirtualColumnNode must be the type of the underlying column. It's role is not as a type
          changer. If it is meant to have that role as well, which would seem to overload its purpose, then that should be clearly documented in the class.
          I think keeping the roles of the various nodes to be clear and simple is the best approach.

          Show
          djd Daniel John Debrunner added a comment - For the rationale to the change to VirtualColumnNode (554012) From the description of VirtualColumnNode it is a representation of a base column from a view or a table in another part of a query. If this is the case then the type of the VirtualColumnNode must be the type of the underlying column. It's role is not as a type changer. If it is meant to have that role as well, which would seem to overload its purpose, then that should be clearly documented in the class. I think keeping the roles of the various nodes to be clear and simple is the best approach.
          Hide
          kmarsden Kathey Marsden added a comment -

          Thanks again everyone for the help.
          In InsertNode.bindStatement we have:
          resultSet = resultSet.genNormalizeResultSetNode(resultSet, false);
          resultColumnList.copyTypesAndLengthsToSource(resultSet.getResultColumns());

          ResultSetNode.genNormalizeResultSetNode makes a shallow copy of the resultColumnList and
          replaces the expressions with VirtualColumnNodes which still point at the source column:
          So ultimately we hit this line in ResultColumnList.copyTypesAndLengthsToSource

          sourceRC.getExpression().setType(resultColumn.getTypeServices());

          which ends up changing the underlying source column type with the new setType method.

          As an experiment, I tried commenting that line in copyTypesAndLengthsToSource
          out, to find out why it is needed and I found that suites.All and derbyall passed with the line
          commented out as did the repro for this issue. I wonder if/why it is needed.

          Below is the experimental patch:
          Index: java/engine/org/apache/derby/impl/sql/compile/ResultColumnList.java
          ===================================================================
          — java/engine/org/apache/derby/impl/sql/compile/ResultColumnList.java (revision 629575)
          +++ java/engine/org/apache/derby/impl/sql/compile/ResultColumnList.java (working copy)
          @@ -1642,7 +1642,7 @@
          ResultColumn sourceRC = (ResultColumn) sourceRCL.elementAt(index);
          ResultColumn resultColumn = (ResultColumn) elementAt(index);
          sourceRC.setType(resultColumn.getTypeServices());

          • sourceRC.getExpression().setType(resultColumn.getTypeServices());
            + //sourceRC.getExpression().setType(resultColumn.getTypeServices());
            }
            }
          Show
          kmarsden Kathey Marsden added a comment - Thanks again everyone for the help. In InsertNode.bindStatement we have: resultSet = resultSet.genNormalizeResultSetNode(resultSet, false); resultColumnList.copyTypesAndLengthsToSource(resultSet.getResultColumns()); ResultSetNode.genNormalizeResultSetNode makes a shallow copy of the resultColumnList and replaces the expressions with VirtualColumnNodes which still point at the source column: So ultimately we hit this line in ResultColumnList.copyTypesAndLengthsToSource sourceRC.getExpression().setType(resultColumn.getTypeServices()); which ends up changing the underlying source column type with the new setType method. As an experiment, I tried commenting that line in copyTypesAndLengthsToSource out, to find out why it is needed and I found that suites.All and derbyall passed with the line commented out as did the repro for this issue. I wonder if/why it is needed. Below is the experimental patch: Index: java/engine/org/apache/derby/impl/sql/compile/ResultColumnList.java =================================================================== — java/engine/org/apache/derby/impl/sql/compile/ResultColumnList.java (revision 629575) +++ java/engine/org/apache/derby/impl/sql/compile/ResultColumnList.java (working copy) @@ -1642,7 +1642,7 @@ ResultColumn sourceRC = (ResultColumn) sourceRCL.elementAt(index); ResultColumn resultColumn = (ResultColumn) elementAt(index); sourceRC.setType(resultColumn.getTypeServices()); sourceRC.getExpression().setType(resultColumn.getTypeServices()); + //sourceRC.getExpression().setType(resultColumn.getTypeServices()); } }
          Hide
          kmarsden Kathey Marsden added a comment -

          Attached is my initial attempt at fixing this issue. I am not quite sure this is the right approach. The fix removes the line call setType on the underlying expressions from copyTypesAndLengthsToSource, to avoid changing the type of the source column of VirtualColumnNodes.

          I haven't been able to find any cases that this change affects adversely, but perhaps this is not the right approach. Maybe the problem is that genNormalizeResultSetNode puts the VirtualColumnNodes in the normalized resultset in the first place. Perhaps they should be CastNodes?
          Then perhaps copyTypesAndLengthsToSource wouldn't be necessary.
          Anyway I would appreciate comments on the patch and possible alternatives.

          Just for some historical background, I traced copyLengthsAndTypesToSource back to its origin.
          It was put in to address a case that is no longer valid with Derby, but I'll put it here just in case it helps:

          – the following gets an exception trying to convert '11.4' to
          an integer
          create table trash(c1 real check(c1 > 2));
          insert into trash values '11.4';

          Thanks

          Kathey

          Show
          kmarsden Kathey Marsden added a comment - Attached is my initial attempt at fixing this issue. I am not quite sure this is the right approach. The fix removes the line call setType on the underlying expressions from copyTypesAndLengthsToSource, to avoid changing the type of the source column of VirtualColumnNodes. I haven't been able to find any cases that this change affects adversely, but perhaps this is not the right approach. Maybe the problem is that genNormalizeResultSetNode puts the VirtualColumnNodes in the normalized resultset in the first place. Perhaps they should be CastNodes? Then perhaps copyTypesAndLengthsToSource wouldn't be necessary. Anyway I would appreciate comments on the patch and possible alternatives. Just for some historical background, I traced copyLengthsAndTypesToSource back to its origin. It was put in to address a case that is no longer valid with Derby, but I'll put it here just in case it helps: – the following gets an exception trying to convert '11.4' to an integer create table trash(c1 real check(c1 > 2)); insert into trash values '11.4'; Thanks Kathey
          Hide
          djd Daniel John Debrunner added a comment -

          Nice find.

          To me the whole method seems very strange, why does the type of the source node need to be changed?
          I wonder if the whole method should be removed, as you say a CAST node might be better.
          Changing the type of the source node could be a potential for a huge source of bugs, if the implementation
          of the node is expecting its type to match its contents.

          Of course the existing comment in the javadoc for the method is one of those useless comments:

          >>> This is useful when adding a NormalizeResultSetNode.

          The missing part is why is it useful. An example of why I try to request good comments during code reviews,
          why is the most important part of any comment.

          Show
          djd Daniel John Debrunner added a comment - Nice find. To me the whole method seems very strange, why does the type of the source node need to be changed? I wonder if the whole method should be removed, as you say a CAST node might be better. Changing the type of the source node could be a potential for a huge source of bugs, if the implementation of the node is expecting its type to match its contents. Of course the existing comment in the javadoc for the method is one of those useless comments: >>> This is useful when adding a NormalizeResultSetNode. The missing part is why is it useful. An example of why I try to request good comments during code reviews, why is the most important part of any comment.
          Hide
          kmarsden Kathey Marsden added a comment -

          Thanks Dan for looking at the patch. Do you think this is a
          sufficient incremental improvement to resolve the regression?
          then I can file a separate issue to investgate the wisdom and future
          of copyLengthsAndTypesToSource.

          Simply removing it lead to all sorts of unpleasantness, such as
          ij> insert into d3310 select distinct * from (values 2.0, 2.1, 2.2) v;
          3 rows inserted/updated/deleted
          ij> select * from d3310;
          X
          --------------------
          ERROR XSDA7: Restore of a serializable or SQLData object of class , attempted to read more data than was originally stor
          ed
          ERROR XJ001: Java exception: ': java.io.EOFException'.
          ij> exit;

          I am going to attach a diff file of the original change that inserted the method.
          I am not sure I understand all the code that was there before but I think
          it was perhaps superior to the fix.

          Show
          kmarsden Kathey Marsden added a comment - Thanks Dan for looking at the patch. Do you think this is a sufficient incremental improvement to resolve the regression? then I can file a separate issue to investgate the wisdom and future of copyLengthsAndTypesToSource. Simply removing it lead to all sorts of unpleasantness, such as ij> insert into d3310 select distinct * from (values 2.0, 2.1, 2.2) v; 3 rows inserted/updated/deleted ij> select * from d3310; X -------------------- ERROR XSDA7: Restore of a serializable or SQLData object of class , attempted to read more data than was originally stor ed ERROR XJ001: Java exception: ': java.io.EOFException'. ij> exit; I am going to attach a diff file of the original change that inserted the method. I am not sure I understand all the code that was there before but I think it was perhaps superior to the fix.
          Hide
          djd Daniel John Debrunner added a comment -

          > Do you think this is a sufficient incremental improvement to resolve the regression?

          In my opinion not at the moment, I'm uncomfortable with fixes that can't be explained. At the moment it's a case
          of :

          • if I do this then it appears to work but I don't know why.

          It's good that you've found the likely source of the problem, but it's just a starting point.
          I'll spend some time trying to understand what that method is trying to do, it probably has to
          do with some comments in ResultSetNode.genNormalizeResultSetNode:

          • Put a NormalizeResultSetNode on top of the specified ResultSetNode.
          • ColumnReferences must continue to point to the same ResultColumn, so
          • that ResultColumn must percolate up to the new PRN.

          Not sure what that means exactly, but it seems related.

          Show
          djd Daniel John Debrunner added a comment - > Do you think this is a sufficient incremental improvement to resolve the regression? In my opinion not at the moment, I'm uncomfortable with fixes that can't be explained. At the moment it's a case of : if I do this then it appears to work but I don't know why. It's good that you've found the likely source of the problem, but it's just a starting point. I'll spend some time trying to understand what that method is trying to do, it probably has to do with some comments in ResultSetNode.genNormalizeResultSetNode: Put a NormalizeResultSetNode on top of the specified ResultSetNode. ColumnReferences must continue to point to the same ResultColumn, so that ResultColumn must percolate up to the new PRN. Not sure what that means exactly, but it seems related.
          Hide
          djd Daniel John Debrunner added a comment -

          ResultSetNode.genNormalizeResultSetNode() is a little strange:

          Its purpose is to push a normalize result set node on-top of a ResultSetNode.

          The passed in child node is always the node is actually invoked on, so it's always operating on this.
          (I'll attach a simple patch for that, it simplifies the code a little, otherwise it appears the method is operating on
          two result set nodes instead of just itself.

          What's strange to me is that even though it is wrapping itself with a normalize node, the code ends up modifying
          its own resultColumns. A normal, expected pattern for such a situation is that the node to be wrapped would
          be unmodified by pushing a normalize node on top of it. This could be related to this issue, the source nodes and
          target nodes might be being mixed up.

          [sorry if this appears unclear, it's just something I noticed and wanted to quickly throw out there]

          Show
          djd Daniel John Debrunner added a comment - ResultSetNode.genNormalizeResultSetNode() is a little strange: Its purpose is to push a normalize result set node on-top of a ResultSetNode. The passed in child node is always the node is actually invoked on, so it's always operating on this. (I'll attach a simple patch for that, it simplifies the code a little, otherwise it appears the method is operating on two result set nodes instead of just itself. What's strange to me is that even though it is wrapping itself with a normalize node, the code ends up modifying its own resultColumns. A normal, expected pattern for such a situation is that the node to be wrapped would be unmodified by pushing a normalize node on top of it. This could be related to this issue, the source nodes and target nodes might be being mixed up. [sorry if this appears unclear, it's just something I noticed and wanted to quickly throw out there]
          Hide
          djd Daniel John Debrunner added a comment -

          [more rambling]
          The pattern of modifying the node being wrapped when wrapping it seems a common theme in ResultSetNode:

          genProjectRestrictForReordering & genProjectRestrict also do it.

          Have to say it's a very counter-intuitive coding style and likely to lead to bugs.

          Show
          djd Daniel John Debrunner added a comment - [more rambling] The pattern of modifying the node being wrapped when wrapping it seems a common theme in ResultSetNode: genProjectRestrictForReordering & genProjectRestrict also do it. Have to say it's a very counter-intuitive coding style and likely to lead to bugs.
          Hide
          djd Daniel John Debrunner added a comment - - edited

          Looking at NormalizeResultSet I now see that cast nodes are not needed. NormalizeResultSet performs type
          conversions itself without any need for cast nodes.

          It needs a set of DataTypeDescriptors that represent the intended types. SO I'm guessing the layout should be
          something like to say convert our INT into a BIGINT

          NormalizeNode
          ResultColumn(BIGINT) -> VirtualColumnNode (??) -> A (below)

          source result set node
          A - ResultColumn (INT) -> constant (INT)

          That's a rough guess, but I think it's beginning to show why the fix works. I think without the change
          the type of A would be set to BIGINT and with the change it would remain at INT.

          Not saying I fully understand it yet, but just throwing more info out there from looking at the code.

          [edit to have conversion (INT TO BIGINT) correct]

          Show
          djd Daniel John Debrunner added a comment - - edited Looking at NormalizeResultSet I now see that cast nodes are not needed. NormalizeResultSet performs type conversions itself without any need for cast nodes. It needs a set of DataTypeDescriptors that represent the intended types. SO I'm guessing the layout should be something like to say convert our INT into a BIGINT NormalizeNode ResultColumn(BIGINT) -> VirtualColumnNode (??) -> A (below) source result set node A - ResultColumn (INT) -> constant (INT) That's a rough guess, but I think it's beginning to show why the fix works. I think without the change the type of A would be set to BIGINT and with the change it would remain at INT. Not saying I fully understand it yet, but just throwing more info out there from looking at the code. [edit to have conversion (INT TO BIGINT) correct]
          Hide
          djd Daniel John Debrunner added a comment -

          Cleanup of ResultSetNode.genNormalizeResultSetNode() so that it always operates in itself (ie. it returns a NormalizeResultSetNode that wraps itself. This was the usage of the method in every case, thus make it the only possible behaviour to make the code clearer.

          Show
          djd Daniel John Debrunner added a comment - Cleanup of ResultSetNode.genNormalizeResultSetNode() so that it always operates in itself (ie. it returns a NormalizeResultSetNode that wraps itself. This was the usage of the method in every case, thus make it the only possible behaviour to make the code clearer.
          Hide
          djd Daniel John Debrunner added a comment -

          [yet more rambling]
          A ResultSetNode has a set of ResultColumns and each ResultColumn has an expression under it.

          A NormalizeResultSetNode is added above a ResultSetNode (wraps it) when the type of at least one of its
          ResultColumn does not match the type of its expression,
          ResultColumn indicates this can be the case & I added some more comments to that.

          Once the decision is made to add a NormalizeResultSetNode there is some processing of ResultColumns and ResultColumnLists
          that I haven't understood yet, this includes the call copyLengthsAndTypesToSource() that Kathey modified.

          Setting up a NormalizeResultSetNode is spread over three locations, the class itself (very little, it's almost acting like a C struct),
          the genNormalizeResultSetNode method and then copyLengthsAndTypesToSource. A good O-O implementation would have
          the logic to create a NormalizeResultSetNode self-contained in NormalizeResultSetNode.

          Since the ResultColumnList of the original ResultSetNode correctly describes the desired outcome, it's not clear to
          me why NormalizeResultSetNode can't just refer to the same list and use it for its processing. They may be some chance
          that this would cause recursion at some point, where a NormalizeResultSetNode would think it needed to be wrapped
          in a NormalizeResultSetNode since the types of its columns and expression don't match (i.e. when it is handled as a regular ResultSetNode).

          I think moving the setup of a NormalizeResultSetNode into the class itself, so that its inputs are just the ResultSetNode to wrap
          would help clear up the code, especially if comments were added indicating why certain actions were being taken.

          Show
          djd Daniel John Debrunner added a comment - [yet more rambling] A ResultSetNode has a set of ResultColumns and each ResultColumn has an expression under it. A NormalizeResultSetNode is added above a ResultSetNode (wraps it) when the type of at least one of its ResultColumn does not match the type of its expression, ResultColumn indicates this can be the case & I added some more comments to that. Once the decision is made to add a NormalizeResultSetNode there is some processing of ResultColumns and ResultColumnLists that I haven't understood yet, this includes the call copyLengthsAndTypesToSource() that Kathey modified. Setting up a NormalizeResultSetNode is spread over three locations, the class itself (very little, it's almost acting like a C struct), the genNormalizeResultSetNode method and then copyLengthsAndTypesToSource. A good O-O implementation would have the logic to create a NormalizeResultSetNode self-contained in NormalizeResultSetNode. Since the ResultColumnList of the original ResultSetNode correctly describes the desired outcome, it's not clear to me why NormalizeResultSetNode can't just refer to the same list and use it for its processing. They may be some chance that this would cause recursion at some point, where a NormalizeResultSetNode would think it needed to be wrapped in a NormalizeResultSetNode since the types of its columns and expression don't match (i.e. when it is handled as a regular ResultSetNode). I think moving the setup of a NormalizeResultSetNode into the class itself, so that its inputs are just the ResultSetNode to wrap would help clear up the code, especially if comments were added indicating why certain actions were being taken.
          Hide
          kmarsden Kathey Marsden added a comment -

          The attached patch derby-3310_remove_genNormalizeResultSetNode_diff.txt moves the contents of ResultSetNode.genNormalizeResultSetNode into NormalizeResultSetNode.init() method. The method still copies the ResultColumnList as before.

          I had a harder time with getting rid of copyTypesAndLengthsToSource, so decided to wait for a second patch for removing that method. I'll post some questions about that after this patch gets committed.

          Show
          kmarsden Kathey Marsden added a comment - The attached patch derby-3310_remove_genNormalizeResultSetNode_diff.txt moves the contents of ResultSetNode.genNormalizeResultSetNode into NormalizeResultSetNode.init() method. The method still copies the ResultColumnList as before. I had a harder time with getting rid of copyTypesAndLengthsToSource, so decided to wait for a second patch for removing that method. I'll post some questions about that after this patch gets committed.
          Hide
          kmarsden Kathey Marsden added a comment -

          I should probably mention that I am not convinced that moving the
          genNormalizeResultSetNode code into the NormalizeResultSetNode init method
          makes things much clearer. It seems to bury the modification of the ResultSetNode
          deeper IMHO. Anyway I would appreciate a second opinion on this and review of
          the patch.

          Thanks

          Kathey

          Show
          kmarsden Kathey Marsden added a comment - I should probably mention that I am not convinced that moving the genNormalizeResultSetNode code into the NormalizeResultSetNode init method makes things much clearer. It seems to bury the modification of the ResultSetNode deeper IMHO. Anyway I would appreciate a second opinion on this and review of the patch. Thanks Kathey
          Hide
          djd Daniel John Debrunner added a comment -

          I haven't looked at the patch yet, but I agree modifying the ResultSetNode (R) to be wrapped inside NormalizeResultSetNode (NR) would be really unclear,
          but I don't think the ResultSetNode R should be modified. It's a very unnatural coding style, subject to bugs, e.g. with the code as-is any other node that thinks
          it is pointing to R's ResultColumns silently gets changed to be pointing to NR's ResultColumns.

          Show
          djd Daniel John Debrunner added a comment - I haven't looked at the patch yet, but I agree modifying the ResultSetNode (R) to be wrapped inside NormalizeResultSetNode (NR) would be really unclear, but I don't think the ResultSetNode R should be modified. It's a very unnatural coding style, subject to bugs, e.g. with the code as-is any other node that thinks it is pointing to R's ResultColumns silently gets changed to be pointing to NR's ResultColumns.
          Hide
          kmarsden Kathey Marsden added a comment -

          I think then I misinterpreted your suggestion:
          >Setting up a NormalizeResultSetNode is spread over three locations, the class itself (very little, it's almost acting like a C struct),
          >the genNormalizeResultSetNode method and then copyLengthsAndTypesToSource. A good O-O implementation would have
          >the logic to create a NormalizeResultSetNode self-contained in NormalizeResultSetNode.

          I was endeavoring to move the existing logic into NormalizeResultSetNode. Perhaps some
          other first steps are more appropriate. Any advice is appreciated. I do feel however that
          I am getting further away, not closer to resolving DERBY-3310.

          Show
          kmarsden Kathey Marsden added a comment - I think then I misinterpreted your suggestion: >Setting up a NormalizeResultSetNode is spread over three locations, the class itself (very little, it's almost acting like a C struct), >the genNormalizeResultSetNode method and then copyLengthsAndTypesToSource. A good O-O implementation would have >the logic to create a NormalizeResultSetNode self-contained in NormalizeResultSetNode. I was endeavoring to move the existing logic into NormalizeResultSetNode. Perhaps some other first steps are more appropriate. Any advice is appreciated. I do feel however that I am getting further away, not closer to resolving DERBY-3310 .
          Hide
          kmarsden Kathey Marsden added a comment -

          Unmarking patch available. It seems that it does not make sense to move the logic into NormalizeResultSetNode until we can avoid changing the underlying ResultSetNode. Dan suggested I try to make a write up of the current behaviour and what is being modified. I will try to do that.

          Kathey

          Show
          kmarsden Kathey Marsden added a comment - Unmarking patch available. It seems that it does not make sense to move the logic into NormalizeResultSetNode until we can avoid changing the underlying ResultSetNode. Dan suggested I try to make a write up of the current behaviour and what is being modified. I will try to do that. Kathey
          Hide
          kmarsden Kathey Marsden added a comment -

          genNormalizeResultSetNode() has this code:
          /* We get a shallow copy of the ResultColumnList and its

          • ResultColumns. (Copy maintains ResultColumn.expression for now.)
            */
            ResultColumnList prRCList = resultColumns;
            resultColumns = resultColumns.copyListAndObjects();

          ...
          Change prRCList to remove generatedGroupingColumns and add
          VirtualColumnNodes, get NormalizeResultSetNode

          I wanted to understand the difference between doing this and just leaving
          resultColumns alone and working on the copy for prRCList so I changed it to just
          ResultColumnList prRCList = resultColumns.copyListAndObjects();
          ...
          Change prRCList to remove generatedGroupingColumns and add
          VirtualColumnNodes, get NormalizeResultSetNode

          This led to a NullPointerException in the generated code for views.sql doing a select
          from a view.
          Caused by: java.lang.NullPointerException
          at org.apache.derby.exe.acaa7ac093x0118x781cx2bb1xffffc28339b210.e3(Unkn
          own Source)
          at org.apache.derby.impl.services.reflect.DirectCall.invoke(ReflectGener
          atedClass.java:145)
          ... 17 more

          I am not sure I understand the difference in my new code and the old and why the
          new doesn't work. If anyone has an immediate ideas that would be most helpful.
          I'll try to debug the NPE to see if that sheds any light.

          Show
          kmarsden Kathey Marsden added a comment - genNormalizeResultSetNode() has this code: /* We get a shallow copy of the ResultColumnList and its ResultColumns. (Copy maintains ResultColumn.expression for now.) */ ResultColumnList prRCList = resultColumns; resultColumns = resultColumns.copyListAndObjects(); ... Change prRCList to remove generatedGroupingColumns and add VirtualColumnNodes, get NormalizeResultSetNode I wanted to understand the difference between doing this and just leaving resultColumns alone and working on the copy for prRCList so I changed it to just ResultColumnList prRCList = resultColumns.copyListAndObjects(); ... Change prRCList to remove generatedGroupingColumns and add VirtualColumnNodes, get NormalizeResultSetNode This led to a NullPointerException in the generated code for views.sql doing a select from a view. Caused by: java.lang.NullPointerException at org.apache.derby.exe.acaa7ac093x0118x781cx2bb1xffffc28339b210.e3(Unkn own Source) at org.apache.derby.impl.services.reflect.DirectCall.invoke(ReflectGener atedClass.java:145) ... 17 more I am not sure I understand the difference in my new code and the old and why the new doesn't work. If anyone has an immediate ideas that would be most helpful. I'll try to debug the NPE to see if that sheds any light.
          Hide
          kmarsden Kathey Marsden added a comment -

          Unassigning myself from this issue for now while I focus on DERBY-3494 to reorganize the code. I am not sure that that is necessary to resolve this issue. Perhaps someone has a more straight forward solution and can get this regression fixed for 10.4.

          Status of the issue as I leave it is:

          I submitted a patch derby-3310_try1_diff.txt that resolves the issue and passes regression tests. That patch removes the change of the underlying expression of the NormalizeResultSetNode, so that it does not affect the sort. ResultSetNode expects the type of the column to be different from the expression in this case, so the patch seems to resolve the issue. Dan had some concerns with the patch however, since the Normalization code is not clear, we are not sure if it is ok to remove the change of the expresion type.

          Dan thought reorganizing the normalization code to avoid changing the underlying result set would make things clearer (DERBY-3494), so I am looking at that now.

          Show
          kmarsden Kathey Marsden added a comment - Unassigning myself from this issue for now while I focus on DERBY-3494 to reorganize the code. I am not sure that that is necessary to resolve this issue. Perhaps someone has a more straight forward solution and can get this regression fixed for 10.4. Status of the issue as I leave it is: I submitted a patch derby-3310_try1_diff.txt that resolves the issue and passes regression tests. That patch removes the change of the underlying expression of the NormalizeResultSetNode, so that it does not affect the sort. ResultSetNode expects the type of the column to be different from the expression in this case, so the patch seems to resolve the issue. Dan had some concerns with the patch however, since the Normalization code is not clear, we are not sure if it is ok to remove the change of the expresion type. Dan thought reorganizing the normalization code to avoid changing the underlying result set would make things clearer ( DERBY-3494 ), so I am looking at that now.
          Hide
          army A B added a comment -

          Dan> I don't think the ResultSetNode R should be modified. It's a very unnatural coding style,
          Dan> subject to bugs, e.g. with the code as-is any other node that thinks it is pointing to R's
          Dan> ResultColumns silently gets changed to be pointing to NR's ResultColumns.

          After doing a bit of investigation for DERBY-3494, I think this "silent update" effect is precisely
          why ResultSetNode R is being modified. That is, if there is some column reference pointing
          to R's ResultColumns and then additional nodes are generated on top of R, it appears that
          those column references must point to the "top" of the newly-generated nodes instead of
          to R. At least, that appears to be what execution-time logic expects (see DERBY-3494).

          So the execution code expects that any ColumnReference node which thinks it is pointing
          to R's ResultColumns is silently changed to point to the NR's result columns. It may be
          "unnatural" and "subject to bugs", but this particular "side effect" looks to be intentional. Not
          good, necessarily, but intentional...

          I'm not saying anything else about this particular issue (I still need to catch up on the discussion),
          but I thought I'd throw that out there. Apologies if it's irrelevant.

          Show
          army A B added a comment - Dan> I don't think the ResultSetNode R should be modified. It's a very unnatural coding style, Dan> subject to bugs, e.g. with the code as-is any other node that thinks it is pointing to R's Dan> ResultColumns silently gets changed to be pointing to NR's ResultColumns. After doing a bit of investigation for DERBY-3494 , I think this "silent update" effect is precisely why ResultSetNode R is being modified. That is, if there is some column reference pointing to R's ResultColumns and then additional nodes are generated on top of R, it appears that those column references must point to the "top" of the newly-generated nodes instead of to R. At least, that appears to be what execution-time logic expects (see DERBY-3494 ). So the execution code expects that any ColumnReference node which thinks it is pointing to R's ResultColumns is silently changed to point to the NR's result columns. It may be "unnatural" and "subject to bugs", but this particular "side effect" looks to be intentional. Not good, necessarily, but intentional... I'm not saying anything else about this particular issue (I still need to catch up on the discussion), but I thought I'd throw that out there. Apologies if it's irrelevant.
          Hide
          army A B added a comment -

          I did some tracing through the various compilation stages for the short reproduction provided by
          Knut Anders in his February 21st comment. Attached is d3310_writeup_1.html, which is an attempt
          to describe my findings. Hopefully this is helpful...

          Show
          army A B added a comment - I did some tracing through the various compilation stages for the short reproduction provided by Knut Anders in his February 21st comment. Attached is d3310_writeup_1.html, which is an attempt to describe my findings. Hopefully this is helpful...
          Hide
          kmarsden Kathey Marsden added a comment -

          Thank you Army for your detailed write up and articulating what
          I could not for the proposed change for this issue. Dan given
          Army's analysis, do you think there is now sufficient reasoning for
          the derby-3310_try1_diff.txt change.

          Show
          kmarsden Kathey Marsden added a comment - Thank you Army for your detailed write up and articulating what I could not for the proposed change for this issue. Dan given Army's analysis, do you think there is now sufficient reasoning for the derby-3310_try1_diff.txt change.
          Hide
          kmarsden Kathey Marsden added a comment -

          Attached is a revision of the try1 patch (derby-3310_diff.txt) that adds some comments to the changed method. Since Army confirmed that this change the right thing to do, I would like to commit this patch to get rid of this regression for 10.4. I welcome comments on the comments and the patch.

          Ultimately with DERBY-3494 I would like to try to roll this method into NormalizeResultSetNode, but for this fix I think it best not to combine it with code reorg.

          Show
          kmarsden Kathey Marsden added a comment - Attached is a revision of the try1 patch (derby-3310_diff.txt) that adds some comments to the changed method. Since Army confirmed that this change the right thing to do, I would like to commit this patch to get rid of this regression for 10.4. I welcome comments on the comments and the patch. Ultimately with DERBY-3494 I would like to try to roll this method into NormalizeResultSetNode, but for this fix I think it best not to combine it with code reorg.
          Hide
          kmarsden Kathey Marsden added a comment -

          had an extraneous file in the diff

          Show
          kmarsden Kathey Marsden added a comment - had an extraneous file in the diff
          Hide
          kmarsden Kathey Marsden added a comment -

          I plan to commit DERBY-3310_diff.txt this afternoon.
          Please let me know if you have any concerns.

          Show
          kmarsden Kathey Marsden added a comment - I plan to commit DERBY-3310 _diff.txt this afternoon. Please let me know if you have any concerns.
          Hide
          djd Daniel John Debrunner added a comment -

          It would be good to get the contents of Army's analysis into the code, rather than just hanging off a jira issue.

          Show
          djd Daniel John Debrunner added a comment - It would be good to get the contents of Army's analysis into the code, rather than just hanging off a jira issue.
          Hide
          kmarsden Kathey Marsden added a comment -

          I wasn't so sure where to hang it in the code where it would make sense, especially after
          the change was made, so I added a short summary to copyTypesAndLengthsToSource
          and referenced the full document. Where in the code would be a good place to put it? I could add it as
          part of the commit log or ...

          Show
          kmarsden Kathey Marsden added a comment - I wasn't so sure where to hang it in the code where it would make sense, especially after the change was made, so I added a short summary to copyTypesAndLengthsToSource and referenced the full document. Where in the code would be a good place to put it? I could add it as part of the commit log or ...
          Hide
          djd Daniel John Debrunner added a comment -

          The commit log is not the correct place.

          I don't know where the correct place is, but it seems to be explaining how various nodes work together etc.,
          so appropriate comments in the javadoc for nodes and/or their methods seems the best. I'm sure it's not just
          a matter of taking the text and placing it in one location. It just seems a shame for the information Army worked hard
          to produce not to be in the most useful place.

          If anyone gains understanding of any piece of code during development or bug-fixing then it's best for the community
          if that knowledge is kept, typically by adding comments to the code.

          Another option is the language how-it-works section on the wiki, there it probably could be as a single document.

          Show
          djd Daniel John Debrunner added a comment - The commit log is not the correct place. I don't know where the correct place is, but it seems to be explaining how various nodes work together etc., so appropriate comments in the javadoc for nodes and/or their methods seems the best. I'm sure it's not just a matter of taking the text and placing it in one location. It just seems a shame for the information Army worked hard to produce not to be in the most useful place. If anyone gains understanding of any piece of code during development or bug-fixing then it's best for the community if that knowledge is kept, typically by adding comments to the code. Another option is the language how-it-works section on the wiki, there it probably could be as a single document.
          Hide
          kmarsden Kathey Marsden added a comment -

          Resolving for 10.5 and 10.4. Thanks so much Army for your help with this issue. I still plan to post some form of Army's document on the Wiki.

          Show
          kmarsden Kathey Marsden added a comment - Resolving for 10.5 and 10.4. Thanks so much Army for your help with this issue. I still plan to post some form of Army's document on the Wiki.
          Hide
          bryanpendleton Bryan Pendleton added a comment -

          A little late on this, but I finally got around to having a close read
          of Army's writeup and it seems 100% accurate to me. It's a
          wonderful document, and it really helped me understand how
          the Normalize node works, as well as the DERBY-2775 changes interacted
          to exhibit this behavior.

          I think that the document would be quite valuable in the
          class-level javadoc for NormalizeResultSetNode.java, as it
          really does an excellent job of explaining how NRSN is intended to work.
          It's a clear example, and quite carefully worded and thorough.

          I also think that the document would be quite useful in the wiki,
          if that proves to be easier.

          Show
          bryanpendleton Bryan Pendleton added a comment - A little late on this, but I finally got around to having a close read of Army's writeup and it seems 100% accurate to me. It's a wonderful document, and it really helped me understand how the Normalize node works, as well as the DERBY-2775 changes interacted to exhibit this behavior. I think that the document would be quite valuable in the class-level javadoc for NormalizeResultSetNode.java, as it really does an excellent job of explaining how NRSN is intended to work. It's a clear example, and quite carefully worded and thorough. I also think that the document would be quite useful in the wiki, if that proves to be easier.

            People

            • Assignee:
              kmarsden Kathey Marsden
              Reporter:
              dyret Dyre Tjeldvoll
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development