Derby
  1. Derby
  2. DERBY-4695

Internal assignment of tablenumer, columnnumber looks wrong in query tree, although no ill effects are seen.

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 10.7.1.1
    • Fix Version/s: 10.7.1.1
    • Component/s: SQL
    • Labels:
      None

      Description

      While looking into derby-4679, I also looked at the query in
      derby-2526 to validate that my changes also worked for that variant of
      the problem.

      During inspection of the query trees after the bind phase, I found one
      instance in which the pair (tablenumber, columnnumber) of a column
      reference was wrong. Although it did not seem to impact the query
      result, I note it here as as something we should probably investiate
      as it may be a symptom an underlying problem, or a potential for
      future problems.

      The query looks like this:

      select b3.* from b3 join bvw on (b3.c8 = bvw.c5) join b4 on (bvw.c1 = b4.c7) where b4.c4 = 42"

      and the underlying DDL is this:

      create table b2 (c1 int, c2 int, c3 char(1), c4 int, c5 int, c6 int);
      create table b4 (c7 int, c4 int, c6 int);
      create table b3 (c8 int, c9 int, c5 int, c6 int);
      create table b (c1 int, c2 int, c3 char(1), c4 int, c5 int, c6 int);
      create view bvw (c5, c1 ,c2 ,c3 ,c4) as
      select c5, c1 ,c2 ,c3 ,c4 from b2 union
      select c5, c1 ,c2 ,c3 ,c4 from b;
      create view bvw2 (c1 ,c2 ,c3 ,c4 ,c5) as

      After the bind phase, the join clause "bvw.c1 = b4.c7" has the
      following entry for the column reference bvw.C1:

      tableNumber: 1
      columnNumber: 6

      The problem is that the node with tablenumber 1 is bvw, which is the
      view with the subquery for the union, which has only 5 resulting
      columns, so 6 must be wrong. Although both the view participant tables
      (b, b2) both have six column, the view does not. In any case, C1 is
      column 2 in the view and column 2 in the two union selects from both b
      and b2.

      C1 is however, column 6 of the join node resulting from "select b3.*
      from b3 join bvw on (b3.c8 = bvw.c5)", but the correct table number for
      that would be 5, not 1.

      So, it would seem the table number has been bound to the bvw view's
      result set, but the column number has been bound to the innermost join
      node's result set. This looks worrying to me.

      See derby.log attached for the full dump of the query tree after the
      bind phase.

      sourceResultSet:
      org.apache.derby.impl.sql.compile.FromSubquery@12789d2
      correlation Name: BVW
      null
      tableNumber 1 <------------------------------------------- Note!
      level 0
      resultSetNumber: 0
      referencedTableMap: null
      statementResultSet: false
      resultColumns:
      org.apache.derby.impl.sql.compile.ResultColumnList@c943d1
      indexRow: false
      orderBySelect: 0
      [0]:
      org.apache.derby.impl.sql.compile.ResultColumn@d3c6a3
      **truncated**
      [1]:
      org.apache.derby.impl.sql.compile.ResultColumn@18352d8
      exposedName: C1
      name: C1
      tableName: null
      isDefaultColumn: false
      wasDefaultColumn: false
      isNameGenerated: false
      sourceTableName: B2
      type: INTEGER
      columnDescriptor: null
      isGenerated: false
      isGeneratedForUnmatchedColumnInInsert: false
      isGroupingColumn: false
      isReferenced: true
      isRedundant: false
      virtualColumnId: 2
      resultSetNumber: -1
      dataTypeServices: INTEGER
      expression:
      org.apache.derby.impl.sql.compile.VirtualColumnNode@b40ec4
      dataTypeServices: null
      sourceColumn:
      org.apache.derby.impl.sql.compile.ResultColumn@1d95da8
      **truncated**
      sourceResultSet:
      org.apache.derby.impl.sql.compile.UnionNode@14d7745
      **truncated**
      [2]:
      org.apache.derby.impl.sql.compile.ResultColumn@13576a2
      exposedName: C2
      name: C2
      tableName: null
      isDefaultColumn: false
      wasDefaultColumn: false
      isNameGenerated: false
      sourceTableName: B2
      type: INTEGER
      columnDescriptor: null
      isGenerated: false
      isGeneratedForUnmatchedColumnInInsert: false
      isGroupingColumn: false
      isReferenced: true
      isRedundant: false
      virtualColumnId: 3
      resultSetNumber: -1
      dataTypeServices: INTEGER
      expression:
      org.apache.derby.impl.sql.compile.VirtualColumnNode@ff8c74
      dataTypeServices: null
      sourceColumn:
      org.apache.derby.impl.sql.compile.ResultColumn@61736e
      **truncated**
      sourceResultSet:
      org.apache.derby.impl.sql.compile.UnionNode@14d7745
      **truncated**
      [3]:
      org.apache.derby.impl.sql.compile.ResultColumn@15e2ccd
      exposedName: C3
      name: C3
      tableName: null
      isDefaultColumn: false
      wasDefaultColumn: false
      isNameGenerated: false
      sourceTableName: B2
      type: CHAR(1)
      columnDescriptor: null
      isGenerated: false
      isGeneratedForUnmatchedColumnInInsert: false
      isGroupingColumn: false
      isReferenced: true
      isRedundant: false
      virtualColumnId: 4
      resultSetNumber: -1
      dataTypeServices: CHAR(1)
      expression:
      org.apache.derby.impl.sql.compile.VirtualColumnNode@1cf7491
      dataTypeServices: null
      sourceColumn:
      org.apache.derby.impl.sql.compile.ResultColumn@11946c2
      **truncated**
      sourceResultSet:
      org.apache.derby.impl.sql.compile.UnionNode@14d7745
      **truncated**
      [4]: <----------------------------------------- highest column number is 5 (index is zero-based)
      org.apache.derby.impl.sql.compile.ResultColumn@edf730
      exposedName: C4
      name: C4
      tableName: null
      isDefaultColumn: false
      wasDefaultColumn: false
      isNameGenerated: false
      sourceTableName: B2
      type: INTEGER
      columnDescriptor: null
      isGenerated: false
      isGeneratedForUnmatchedColumnInInsert: false
      isGroupingColumn: false
      isReferenced: true
      isRedundant: false
      virtualColumnId: 5
      resultSetNumber: -1
      dataTypeServices: INTEGER
      expression:
      org.apache.derby.impl.sql.compile.VirtualColumnNode@ff94b1
      dataTypeServices: null
      sourceColumn:
      org.apache.derby.impl.sql.compile.ResultColumn@17a4989
      **truncated**
      sourceResultSet:
      org.apache.derby.impl.sql.compile.UnionNode@14d7745
      **truncated**
      subquery:
      org.apache.derby.impl.sql.compile.UnionNode@14d7745
      **truncated**

      1. derby.log
        79 kB
        Dag H. Wanvik
      2. derby-4695-fixcolumnno-1a.diff
        8 kB
        Dag H. Wanvik
      3. derby-4695-fixcolumnno-1a.stat
        0.5 kB
        Dag H. Wanvik
      4. derby-4695-fixcolumnno-1b.diff
        8 kB
        Dag H. Wanvik
      5. derby-4695-fixcolumnno-1b.stat
        0.5 kB
        Dag H. Wanvik
      6. query-a-after.log
        144 kB
        Dag H. Wanvik
      7. query-a-before.log
        144 kB
        Dag H. Wanvik
      8. query-b-after.log
        277 kB
        Dag H. Wanvik
      9. query-b-before.log
        277 kB
        Dag H. Wanvik
      10. trace-remapping.diff
        0.8 kB
        Dag H. Wanvik
      11. why-ij-can-do-without-2nd-bind.txt
        50 kB
        Dag H. Wanvik
      12. why-loj-needs-2nd-bind.txt
        67 kB
        Dag H. Wanvik

        Issue Links

          Activity

          Dag H. Wanvik created issue -
          Dag H. Wanvik made changes -
          Field Original Value New Value
          Attachment derby.log [ 12446520 ]
          Dag H. Wanvik made changes -
          Affects Version/s 10.7.0.0 [ 12314971 ]
          Dag H. Wanvik made changes -
          Summary Internal assignment of tablenumer, columnnumber looks wrong in query, although no ill effetcs are seen. Internal assignment of tablenumer, columnnumber looks wrong in query tree, although no ill effects are seen.
          Dag H. Wanvik made changes -
          Link This issue is related to DERBY-4679 [ DERBY-4679 ]
          Dag H. Wanvik made changes -
          Link This issue is related to DERBY-2526 [ DERBY-2526 ]
          Dag H. Wanvik made changes -
          Description While looking into derby-4679, I also looked at the query in
          derby-2526 to validate that my changes also worked for that variant of
          the problem.

          During inspection of the query trees after the bind phase, I found one
          instance in which the pair (tablenumber, columnnumber) of a column
          reference was wrong. Although it did not seem to impact the query
          result, I note it here as as something we should probably investiate
          as it may be a symptom an underlying problem, or a potential for
          future problems.

          The query looks like this:

          select b3.* from b3 join bvw on (b3.c8 = bvw.c5) join b4 on (bvw.c1 = b4.c7) where b4.c4 = 42"

          and the underlying DDL is this:

          create table b2 (c1 int, c2 int, c3 char(1), c4 int, c5 int, c6 int);
          create table b4 (c7 int, c4 int, c6 int);
          create table b3 (c8 int, c9 int, c5 int, c6 int);
          create table b (c1 int, c2 int, c3 char(1), c4 int, c5 int, c6 int);
          create view bvw (c5, c1 ,c2 ,c3 ,c4) as
          select c5, c1 ,c2 ,c3 ,c4 from b2 union
          select c5, c1 ,c2 ,c3 ,c4 from b;
          create view bvw2 (c1 ,c2 ,c3 ,c4 ,c5) as

          After the bind phase, the join clause "bvw.c1 = b4.c7" has the
          following entry for the column reference bvw.C1:
                    
                    tableNumber: 1
                    columnNumber: 6

          The problem is that the node with tablenumber 1 is bvw, which is the
          view with the subquery for the union, which has only 5 resulting
          columns, so 6 must be wrong. Although both the view participant tables
          (b, b2) both have six column, the view does not. In any case, C1 is
          column 2 in the view and column 2 in both b and b2.

          C1 is however, column 6 of the join node resulting from "select b3.*
          from b3 join bvw on (b3.c8 = bvw.c5)", but the correct table number for
          that would be 5, not 1.

          So, it would seem the table number has been bound to the bvw view's
          result set, but the column number has been bound to the innermost join
          node's result set. This looks worrying to me.

          See derby.log attached for the full dump of the query tree after the
          bind phase.

          sourceResultSet:
              org.apache.derby.impl.sql.compile.FromSubquery@12789d2
              correlation Name: BVW
              null
              tableNumber 1 <------------------------------------------- Note!
              level 0
              resultSetNumber: 0
              referencedTableMap: null
              statementResultSet: false
              resultColumns:
                  org.apache.derby.impl.sql.compile.ResultColumnList@c943d1
                  indexRow: false
                  orderBySelect: 0
                  [0]:
                  org.apache.derby.impl.sql.compile.ResultColumn@d3c6a3
                  ***truncated***
                  [1]:
                  org.apache.derby.impl.sql.compile.ResultColumn@18352d8
                  exposedName: C1
                  name: C1
                  tableName: null
                  isDefaultColumn: false
                  wasDefaultColumn: false
                  isNameGenerated: false
                  sourceTableName: B2
                  type: INTEGER
                  columnDescriptor: null
                  isGenerated: false
                  isGeneratedForUnmatchedColumnInInsert: false
                  isGroupingColumn: false
                  isReferenced: true
                  isRedundant: false
                  virtualColumnId: 2
                  resultSetNumber: -1
                  dataTypeServices: INTEGER
                  expression:
                      org.apache.derby.impl.sql.compile.VirtualColumnNode@b40ec4
                      dataTypeServices: null
                      sourceColumn:
                          org.apache.derby.impl.sql.compile.ResultColumn@1d95da8
                          ***truncated***
                      sourceResultSet:
                          org.apache.derby.impl.sql.compile.UnionNode@14d7745
                          ***truncated***
                  [2]:
                  org.apache.derby.impl.sql.compile.ResultColumn@13576a2
                  exposedName: C2
                  name: C2
                  tableName: null
                  isDefaultColumn: false
                  wasDefaultColumn: false
                  isNameGenerated: false
                  sourceTableName: B2
                  type: INTEGER
                  columnDescriptor: null
                  isGenerated: false
                  isGeneratedForUnmatchedColumnInInsert: false
                  isGroupingColumn: false
                  isReferenced: true
                  isRedundant: false
                  virtualColumnId: 3
                  resultSetNumber: -1
                  dataTypeServices: INTEGER
                  expression:
                      org.apache.derby.impl.sql.compile.VirtualColumnNode@ff8c74
                      dataTypeServices: null
                      sourceColumn:
                          org.apache.derby.impl.sql.compile.ResultColumn@61736e
                          ***truncated***
                      sourceResultSet:
                          org.apache.derby.impl.sql.compile.UnionNode@14d7745
                          ***truncated***
                  [3]:
                  org.apache.derby.impl.sql.compile.ResultColumn@15e2ccd
                  exposedName: C3
                  name: C3
                  tableName: null
                  isDefaultColumn: false
                  wasDefaultColumn: false
                  isNameGenerated: false
                  sourceTableName: B2
                  type: CHAR(1)
                  columnDescriptor: null
                  isGenerated: false
                  isGeneratedForUnmatchedColumnInInsert: false
                  isGroupingColumn: false
                  isReferenced: true
                  isRedundant: false
                  virtualColumnId: 4
                  resultSetNumber: -1
                  dataTypeServices: CHAR(1)
                  expression:
                      org.apache.derby.impl.sql.compile.VirtualColumnNode@1cf7491
                      dataTypeServices: null
                      sourceColumn:
                          org.apache.derby.impl.sql.compile.ResultColumn@11946c2
                          ***truncated***
                      sourceResultSet:
                          org.apache.derby.impl.sql.compile.UnionNode@14d7745
                          ***truncated***
                  [4]: <----------------------------------------- highest column number is 5 (index is zero-based)
                  org.apache.derby.impl.sql.compile.ResultColumn@edf730
                  exposedName: C4
                  name: C4
                  tableName: null
                  isDefaultColumn: false
                  wasDefaultColumn: false
                  isNameGenerated: false
                  sourceTableName: B2
                  type: INTEGER
                  columnDescriptor: null
                  isGenerated: false
                  isGeneratedForUnmatchedColumnInInsert: false
                  isGroupingColumn: false
                  isReferenced: true
                  isRedundant: false
                  virtualColumnId: 5
                  resultSetNumber: -1
                  dataTypeServices: INTEGER
                  expression:
                      org.apache.derby.impl.sql.compile.VirtualColumnNode@ff94b1
                      dataTypeServices: null
                      sourceColumn:
                          org.apache.derby.impl.sql.compile.ResultColumn@17a4989
                          ***truncated***
                      sourceResultSet:
                          org.apache.derby.impl.sql.compile.UnionNode@14d7745
                          ***truncated***
              subquery:
                  org.apache.derby.impl.sql.compile.UnionNode@14d7745
                  ***truncated***
          While looking into derby-4679, I also looked at the query in
          derby-2526 to validate that my changes also worked for that variant of
          the problem.

          During inspection of the query trees after the bind phase, I found one
          instance in which the pair (tablenumber, columnnumber) of a column
          reference was wrong. Although it did not seem to impact the query
          result, I note it here as as something we should probably investiate
          as it may be a symptom an underlying problem, or a potential for
          future problems.

          The query looks like this:

          select b3.* from b3 join bvw on (b3.c8 = bvw.c5) join b4 on (bvw.c1 = b4.c7) where b4.c4 = 42"

          and the underlying DDL is this:

          create table b2 (c1 int, c2 int, c3 char(1), c4 int, c5 int, c6 int);
          create table b4 (c7 int, c4 int, c6 int);
          create table b3 (c8 int, c9 int, c5 int, c6 int);
          create table b (c1 int, c2 int, c3 char(1), c4 int, c5 int, c6 int);
          create view bvw (c5, c1 ,c2 ,c3 ,c4) as
          select c5, c1 ,c2 ,c3 ,c4 from b2 union
          select c5, c1 ,c2 ,c3 ,c4 from b;
          create view bvw2 (c1 ,c2 ,c3 ,c4 ,c5) as

          After the bind phase, the join clause "bvw.c1 = b4.c7" has the
          following entry for the column reference bvw.C1:
                    
                    tableNumber: 1
                    columnNumber: 6

          The problem is that the node with tablenumber 1 is bvw, which is the
          view with the subquery for the union, which has only 5 resulting
          columns, so 6 must be wrong. Although both the view participant tables
          (b, b2) both have six column, the view does not. In any case, C1 is
          column 2 in the view and column 2 in the two union selects from both b
          and b2.

          C1 is however, column 6 of the join node resulting from "select b3.*
          from b3 join bvw on (b3.c8 = bvw.c5)", but the correct table number for
          that would be 5, not 1.

          So, it would seem the table number has been bound to the bvw view's
          result set, but the column number has been bound to the innermost join
          node's result set. This looks worrying to me.

          See derby.log attached for the full dump of the query tree after the
          bind phase.

          sourceResultSet:
              org.apache.derby.impl.sql.compile.FromSubquery@12789d2
              correlation Name: BVW
              null
              tableNumber 1 <------------------------------------------- Note!
              level 0
              resultSetNumber: 0
              referencedTableMap: null
              statementResultSet: false
              resultColumns:
                  org.apache.derby.impl.sql.compile.ResultColumnList@c943d1
                  indexRow: false
                  orderBySelect: 0
                  [0]:
                  org.apache.derby.impl.sql.compile.ResultColumn@d3c6a3
                  ***truncated***
                  [1]:
                  org.apache.derby.impl.sql.compile.ResultColumn@18352d8
                  exposedName: C1
                  name: C1
                  tableName: null
                  isDefaultColumn: false
                  wasDefaultColumn: false
                  isNameGenerated: false
                  sourceTableName: B2
                  type: INTEGER
                  columnDescriptor: null
                  isGenerated: false
                  isGeneratedForUnmatchedColumnInInsert: false
                  isGroupingColumn: false
                  isReferenced: true
                  isRedundant: false
                  virtualColumnId: 2
                  resultSetNumber: -1
                  dataTypeServices: INTEGER
                  expression:
                      org.apache.derby.impl.sql.compile.VirtualColumnNode@b40ec4
                      dataTypeServices: null
                      sourceColumn:
                          org.apache.derby.impl.sql.compile.ResultColumn@1d95da8
                          ***truncated***
                      sourceResultSet:
                          org.apache.derby.impl.sql.compile.UnionNode@14d7745
                          ***truncated***
                  [2]:
                  org.apache.derby.impl.sql.compile.ResultColumn@13576a2
                  exposedName: C2
                  name: C2
                  tableName: null
                  isDefaultColumn: false
                  wasDefaultColumn: false
                  isNameGenerated: false
                  sourceTableName: B2
                  type: INTEGER
                  columnDescriptor: null
                  isGenerated: false
                  isGeneratedForUnmatchedColumnInInsert: false
                  isGroupingColumn: false
                  isReferenced: true
                  isRedundant: false
                  virtualColumnId: 3
                  resultSetNumber: -1
                  dataTypeServices: INTEGER
                  expression:
                      org.apache.derby.impl.sql.compile.VirtualColumnNode@ff8c74
                      dataTypeServices: null
                      sourceColumn:
                          org.apache.derby.impl.sql.compile.ResultColumn@61736e
                          ***truncated***
                      sourceResultSet:
                          org.apache.derby.impl.sql.compile.UnionNode@14d7745
                          ***truncated***
                  [3]:
                  org.apache.derby.impl.sql.compile.ResultColumn@15e2ccd
                  exposedName: C3
                  name: C3
                  tableName: null
                  isDefaultColumn: false
                  wasDefaultColumn: false
                  isNameGenerated: false
                  sourceTableName: B2
                  type: CHAR(1)
                  columnDescriptor: null
                  isGenerated: false
                  isGeneratedForUnmatchedColumnInInsert: false
                  isGroupingColumn: false
                  isReferenced: true
                  isRedundant: false
                  virtualColumnId: 4
                  resultSetNumber: -1
                  dataTypeServices: CHAR(1)
                  expression:
                      org.apache.derby.impl.sql.compile.VirtualColumnNode@1cf7491
                      dataTypeServices: null
                      sourceColumn:
                          org.apache.derby.impl.sql.compile.ResultColumn@11946c2
                          ***truncated***
                      sourceResultSet:
                          org.apache.derby.impl.sql.compile.UnionNode@14d7745
                          ***truncated***
                  [4]: <----------------------------------------- highest column number is 5 (index is zero-based)
                  org.apache.derby.impl.sql.compile.ResultColumn@edf730
                  exposedName: C4
                  name: C4
                  tableName: null
                  isDefaultColumn: false
                  wasDefaultColumn: false
                  isNameGenerated: false
                  sourceTableName: B2
                  type: INTEGER
                  columnDescriptor: null
                  isGenerated: false
                  isGeneratedForUnmatchedColumnInInsert: false
                  isGroupingColumn: false
                  isReferenced: true
                  isRedundant: false
                  virtualColumnId: 5
                  resultSetNumber: -1
                  dataTypeServices: INTEGER
                  expression:
                      org.apache.derby.impl.sql.compile.VirtualColumnNode@ff94b1
                      dataTypeServices: null
                      sourceColumn:
                          org.apache.derby.impl.sql.compile.ResultColumn@17a4989
                          ***truncated***
                      sourceResultSet:
                          org.apache.derby.impl.sql.compile.UnionNode@14d7745
                          ***truncated***
              subquery:
                  org.apache.derby.impl.sql.compile.UnionNode@14d7745
                  ***truncated***
          Hide
          Dag H. Wanvik added a comment -

          Investigating this, I see that initially, the result column is correctly identified tow BVW.C1, which corresponds to
          tableNumber 1 and columnNumber 2 (in JoinNode#getMatchingColumn's call:

          rightRC = logicalRightRS.getMatchingColumn(columnReference);

          which makes sense, since BVW is on the right hand side of the first
          join. So far, so good. In this step, the column reference's
          tablenumber 1 is set, but the column number is not yet set. The table
          number here is set in logicalRightRS alias
          FromSubquery#getMatchingColumn ca line 335. [It is a subquery due to BVW being a view.]

          Now, later in JoinNode#getMatchingColumn I see a block of code which essentially rebinds the result column again. now to
          the RC in the joins concatenated list of result column, in which C1's position is 6 as I noted before.
          This block of code has this interesting comment:

          /* Insert will bind the underlying result sets which have

          • tables twice. On the 2nd bind, resultColumns != null,
          • we must return the RC from the JoinNode's RCL which is above
          • the RC that we just found above. (Otherwise, the source
          • for the ColumnReference will be from the wrong ResultSet
          • at generate().)
            */
            if (resultColumns != null) { <rebind column reference to RC in concatenated list of RC's of the join> }

          Next, we set the column number based on this newly bound RC, with
          the result that the CR of the join predicate's "BVW.C1" has table
          number and column number with contributions from two different
          RCs... Not good.

          [The column number is set at the end of ColumnReference#bindExpression
          near the end of that method based on the bound RC.]

          If I comment out the above block of code, the column references ends
          up with the correct table and column number (1,2), and the query still
          executes correctly.

          It would seem that this block of code is only intended for INSERTs,
          but even for INSERT we could have a similar query with similar join
          flattening and rewrite needs, and so the error would persist for
          INSERTS even if we disabled this block for non-INSERT statements.

          In sum, the source of the problem seems to be that the assignment of
          table number and column number happens in different places, and in the
          meantime, the bound RC has changed..

          Show
          Dag H. Wanvik added a comment - Investigating this, I see that initially, the result column is correctly identified tow BVW.C1, which corresponds to tableNumber 1 and columnNumber 2 (in JoinNode#getMatchingColumn's call: rightRC = logicalRightRS.getMatchingColumn(columnReference); which makes sense, since BVW is on the right hand side of the first join. So far, so good. In this step, the column reference's tablenumber 1 is set, but the column number is not yet set. The table number here is set in logicalRightRS alias FromSubquery#getMatchingColumn ca line 335. [It is a subquery due to BVW being a view.] Now, later in JoinNode#getMatchingColumn I see a block of code which essentially rebinds the result column again . now to the RC in the joins concatenated list of result column, in which C1's position is 6 as I noted before. This block of code has this interesting comment: /* Insert will bind the underlying result sets which have tables twice. On the 2nd bind, resultColumns != null, we must return the RC from the JoinNode's RCL which is above the RC that we just found above. (Otherwise, the source for the ColumnReference will be from the wrong ResultSet at generate().) */ if (resultColumns != null) { <rebind column reference to RC in concatenated list of RC's of the join> } Next, we set the column number based on this newly bound RC, with the result that the CR of the join predicate's "BVW.C1" has table number and column number with contributions from two different RCs... Not good. [The column number is set at the end of ColumnReference#bindExpression near the end of that method based on the bound RC.] If I comment out the above block of code, the column references ends up with the correct table and column number (1,2), and the query still executes correctly. It would seem that this block of code is only intended for INSERTs, but even for INSERT we could have a similar query with similar join flattening and rewrite needs, and so the error would persist for INSERTS even if we disabled this block for non-INSERT statements. In sum, the source of the problem seems to be that the assignment of table number and column number happens in different places, and in the meantime, the bound RC has changed..
          Hide
          Dag H. Wanvik added a comment -

          My primary interest in persuing this issue is that the fix for DERBY-4679 now relies
          on (tablenumber, columnnumber) for the remapping in
          ColumnReference#remapColumnReferencesToExpressions, only falling back to map via
          column name strings when at the BaseColumnNode level.

          Theoretically, if assignment of (tablenumber, columnnumber) is sometimes wrong, as seen in this
          issue, one could end up with a wrong remap. Actually what happens for the query under investigation here
          is that given the wrong column pair (1,6),

          ftRC = rcl.getResultColumn(
          tableNumberBeforeFlattening,
          columnNumberBeforeFlattening)

          fails to find such a column and falls back to the column string method used solely prior to
          the fix for DERBY-4679.

          If correctly bound to (1,2), the test in the beginning of ColumnReference#remapColumnReferencesToExpressions:

          /* Nothing to do if we are not pointing to a redundant RC */
          if (! source.isRedundant())

          { return this; }

          hits in since the column reference is already bound correctly. The fact that the wrong pair exists is masked by the saving grace of the column name based lookup and thus corrected. This is pure luck, though, another combination of numbers could
          have made the (tn, cn) based lookup find the wrong column.

          This is also why this problem has not been seen in practice, perhaps.

          In summary, the presence of the wrong (tn,cn) assignment is a fact, so there is a small chance that the fix for DERBY-4679 could break existing working queries (which work only due to luck, as shown above). Therefore, I would like to fix this issue
          before back-porting the fix of DERBY-4679 to the 10.6 branch.

          Show
          Dag H. Wanvik added a comment - My primary interest in persuing this issue is that the fix for DERBY-4679 now relies on (tablenumber, columnnumber) for the remapping in ColumnReference#remapColumnReferencesToExpressions, only falling back to map via column name strings when at the BaseColumnNode level. Theoretically, if assignment of (tablenumber, columnnumber) is sometimes wrong, as seen in this issue, one could end up with a wrong remap. Actually what happens for the query under investigation here is that given the wrong column pair (1,6), ftRC = rcl.getResultColumn( tableNumberBeforeFlattening, columnNumberBeforeFlattening) fails to find such a column and falls back to the column string method used solely prior to the fix for DERBY-4679 . If correctly bound to (1,2), the test in the beginning of ColumnReference#remapColumnReferencesToExpressions: /* Nothing to do if we are not pointing to a redundant RC */ if (! source.isRedundant()) { return this; } hits in since the column reference is already bound correctly. The fact that the wrong pair exists is masked by the saving grace of the column name based lookup and thus corrected. This is pure luck, though, another combination of numbers could have made the (tn, cn) based lookup find the wrong column. This is also why this problem has not been seen in practice, perhaps. In summary, the presence of the wrong (tn,cn) assignment is a fact, so there is a small chance that the fix for DERBY-4679 could break existing working queries (which work only due to luck, as shown above). Therefore, I would like to fix this issue before back-porting the fix of DERBY-4679 to the 10.6 branch.
          Dag H. Wanvik made changes -
          Assignee Dag H. Wanvik [ dagw ]
          Hide
          Dag H. Wanvik added a comment -

          Commenting out the block described earlier breaks lots of (all?) (left) outer joins, too (NPE in BaseActivation#getColumnFromRow).
          This fits the description about wrong ResultSet at generate time alluded to in the comment.

          Maybe the "Insert" in the comment is misleading.

          Show
          Dag H. Wanvik added a comment - Commenting out the block described earlier breaks lots of (all?) (left) outer joins, too (NPE in BaseActivation#getColumnFromRow). This fits the description about wrong ResultSet at generate time alluded to in the comment. Maybe the "Insert" in the comment is misleading.
          Hide
          Dag H. Wanvik added a comment -

          Performing the rebinding block only for HalfOuterJoinNode (subclass of JoinNode), makes the
          outer joins pass again, and removes the inconsistent binding seen in the original query of this issue and JoinTest still passes.

          It is still unclear to me why outer join should need this rebinding code, though, but not inner joins (at least those we test in JoinTest).

          Show
          Dag H. Wanvik added a comment - Performing the rebinding block only for HalfOuterJoinNode (subclass of JoinNode), makes the outer joins pass again, and removes the inconsistent binding seen in the original query of this issue and JoinTest still passes. It is still unclear to me why outer join should need this rebinding code, though, but not inner joins (at least those we test in JoinTest).
          Hide
          Dag H. Wanvik added a comment -

          Cf my note on DERBY-4679: Performing the rebinding block only for HalfOuterJoinNode actually also solves
          the DERBY-4679 issue, as well as a variant query which still fails even after the fix for DERBY-4679.

          Show
          Dag H. Wanvik added a comment - Cf my note on DERBY-4679 : Performing the rebinding block only for HalfOuterJoinNode actually also solves the DERBY-4679 issue, as well as a variant query which still fails even after the fix for DERBY-4679 .
          Hide
          Dag H. Wanvik added a comment -

          Uploading why-loj-needs-2nd-bind.txt, which shows graphically what is wrong in query tree just prior to code generation if we skip the rebinding block for this query.

          By way of comparison, I also upload a similar analysis for an inner join which can do without this 2nd binding: why-ij-can-do-without-2nd-bind.txt

          My current thinking is that this hoisting of the column references to point to the RC in the RCL of the join node, is (sometimes) necessary, but can happens too early, cf. the problem addressed in DERBY-4679, since other rewriting can be thwarted by it - we lose the information about which original point in the tree the column reference pointed to. As we have seen it also can (will always?) lead to wrong assignment of table number and column number in the column reference.

          Note that at code generation time, the (tn, cn) information in a column reference can be wrong - it is not used - as long as the source points to the correct result column. In deed it is wrong in the loj sample query, cf. note in why-loj-needs-2nd-bind.txt.

          Note that this doesn't mean that I have concluded that inner join can always do without this 2nd bind, nor that outer join always needs it, this is just a case study to understand what this 2nd bind is doing.

          Show
          Dag H. Wanvik added a comment - Uploading why-loj-needs-2nd-bind.txt, which shows graphically what is wrong in query tree just prior to code generation if we skip the rebinding block for this query. By way of comparison, I also upload a similar analysis for an inner join which can do without this 2nd binding: why-ij-can-do-without-2nd-bind.txt My current thinking is that this hoisting of the column references to point to the RC in the RCL of the join node, is (sometimes) necessary, but can happens too early, cf. the problem addressed in DERBY-4679 , since other rewriting can be thwarted by it - we lose the information about which original point in the tree the column reference pointed to. As we have seen it also can (will always?) lead to wrong assignment of table number and column number in the column reference. Note that at code generation time, the (tn, cn) information in a column reference can be wrong - it is not used - as long as the source points to the correct result column. In deed it is wrong in the loj sample query, cf. note in why-loj-needs-2nd-bind.txt. Note that this doesn't mean that I have concluded that inner join can always do without this 2nd bind, nor that outer join always needs it, this is just a case study to understand what this 2nd bind is doing.
          Dag H. Wanvik made changes -
          Attachment why-loj-needs-2nd-bind.txt [ 12447498 ]
          Attachment why-ij-can-do-without-2nd-bind.txt [ 12447499 ]
          Hide
          Dag H. Wanvik added a comment -

          Another thing to note from these two examples, is that HalfOuterJoinNodes have been assigned table numbers, but plain JoinNodes are not (are still -1).

          Show
          Dag H. Wanvik added a comment - Another thing to note from these two examples, is that HalfOuterJoinNodes have been assigned table numbers, but plain JoinNodes are not (are still -1).
          Hide
          Dag H. Wanvik added a comment -

          The -1 in the table number was a red herring, as this is only seen
          after optimization. It turns out that join nodes are reintroduced
          in SelectNode#modifyAccessPath, which are not involved during the
          flattening process which happens during preprocess time.

          Show
          Dag H. Wanvik added a comment - The -1 in the table number was a red herring, as this is only seen after optimization. It turns out that join nodes are reintroduced in SelectNode#modifyAccessPath, which are not involved during the flattening process which happens during preprocess time.
          Hide
          Dag H. Wanvik added a comment - - edited

          I found the following comment in FromList#bindColumnReference:

          /* TableNumbers are set in the CR in the underlying

          • FromTable. This ensures that they get the table
          • number from the underlying table, not the join node.
          • This is important for beging able to push predicates
          • down through join nodes.

          So it appears that by design, the ColumnReference's source RC is set
          to that of the join, whereas the table number and the column number is
          set to that of the underlying FromTable.

          Referring to the "why-loj-needs-2nd-bind.txt" example:

          select * from t1 left outer join (t2 left outer "
          + "join t3 on t2.c1=t3.c1) on t1.c1=t3.c1

          The 2nd bind sets up the corrects table number
          and column number: although the returned RC from
          JoinNode#getMatchingColumn is that of the JoinNode, the table number
          has been set to the basetable's (not the join's), the column number is
          also set to the position in the base table, since the
          columndescriptor of the returned RC contains the correct column number
          (which is different from the column position of the returned rc in
          the join (2nd bind), cf this call:

          ColumnReference#bindExpression:

          /* Set the columnNumber from the base table.

          • Useful for optimizer and generation.
            */
            columnNumber = matchingRC.getColumnPosition();

          If the FromTable is a base table, matchingRC.getColumnPosition will
          pick it up from the RC's columnDescriptor. This position is the column
          position of the column in the base table, so it is consistent with
          the set table number.

          However, if the underlying FromTable is a subquery, we get the
          virtual column id of the matchingRC, cf. this code for
          getColumnPosition:

          public int getColumnPosition()
          {
          if (columnDescriptor != null)

          { return columnDescriptor.getPosition(); }

          else {
          return virtualColumnId;

          so, the columnNumber ends up pointing to the RC's position in the Join
          node, whereas the table number still points to the FromTable, hence
          the observed inconsistency.

          If this is intentional or wrong, I can't yet say, but it looks weird.
          The comment in ColumnReference#bindExpression says

          "Set the columnNumber from the base table".

          but what we get here is definitely not that

          In the original query posted for this issue, the column number "6"
          observed to look strange, stems from using virtualColumnId
          above, (BVW is a FromSubquery).

          Show
          Dag H. Wanvik added a comment - - edited I found the following comment in FromList#bindColumnReference: /* TableNumbers are set in the CR in the underlying FromTable. This ensures that they get the table number from the underlying table, not the join node. This is important for beging able to push predicates down through join nodes. So it appears that by design, the ColumnReference's source RC is set to that of the join, whereas the table number and the column number is set to that of the underlying FromTable. Referring to the "why-loj-needs-2nd-bind.txt" example: select * from t1 left outer join (t2 left outer " + "join t3 on t2.c1=t3.c1) on t1.c1=t3.c1 The 2nd bind sets up the corrects table number and column number: although the returned RC from JoinNode#getMatchingColumn is that of the JoinNode, the table number has been set to the basetable's (not the join's), the column number is also set to the position in the base table, since the columndescriptor of the returned RC contains the correct column number (which is different from the column position of the returned rc in the join (2nd bind), cf this call: ColumnReference#bindExpression: /* Set the columnNumber from the base table. Useful for optimizer and generation. */ columnNumber = matchingRC.getColumnPosition(); If the FromTable is a base table, matchingRC.getColumnPosition will pick it up from the RC's columnDescriptor. This position is the column position of the column in the base table, so it is consistent with the set table number. However, if the underlying FromTable is a subquery, we get the virtual column id of the matchingRC, cf. this code for getColumnPosition: public int getColumnPosition() { if (columnDescriptor != null) { return columnDescriptor.getPosition(); } else { return virtualColumnId; so, the columnNumber ends up pointing to the RC's position in the Join node, whereas the table number still points to the FromTable, hence the observed inconsistency. If this is intentional or wrong, I can't yet say, but it looks weird. The comment in ColumnReference#bindExpression says "Set the columnNumber from the base table". but what we get here is definitely not that In the original query posted for this issue, the column number "6" observed to look strange, stems from using virtualColumnId above, (BVW is a FromSubquery).
          Hide
          Bryan Pendleton added a comment -

          Gurk. I can envision why pushing predicates around in the tree
          introduces lots of complexity into the management of the column references.

          Your writeups are fascinating, and I'm learning a lot from studying them.

          I have this feeling that you're getting very close to having such a
          thorough knowledge of this subject that you are ready to redesign
          the ColumnReference data structure and replace it with a scheme that
          simplifies and avoids all these problems.

          In the meantime, I'll enjoy continuing to read your findings, so please
          continue posting them as you have the time!

          Show
          Bryan Pendleton added a comment - Gurk. I can envision why pushing predicates around in the tree introduces lots of complexity into the management of the column references. Your writeups are fascinating, and I'm learning a lot from studying them. I have this feeling that you're getting very close to having such a thorough knowledge of this subject that you are ready to redesign the ColumnReference data structure and replace it with a scheme that simplifies and avoids all these problems. In the meantime, I'll enjoy continuing to read your findings, so please continue posting them as you have the time!
          Hide
          Dag H. Wanvik added a comment -

          Thanks for the encouragement, Bryan I'm afraid I am not quite yet
          in the position to replace the existing machinery with a better
          design, but I am slowly learning more about this area, too.

          Meanwhile, I feel more confident that the patch for DERBY-4679 (with
          the added followup-patch) would be safe to back-port from the following
          reasoning:

          • the seen inconsistency lies in the columnNumber, not the tableNumber
          • the (tn, cn)-based lookup used in the patch for DERBY-4679 only
            considers RCs with a correct table number (and column number). If
            the column number in the column reference should be wrong (as
            documented in this issue), the added sanity check for the column
            name to also match, would catch and discard the candidate RC, and
            make the (tn, cn)-based lookup return null.
            I.e. it can not return false positives.
          • subsequently, we would fall back on the exisiting columnName based
            lookup (exisiting behavior)

          This leads me to conclude that the code would always be an improvement over the current solution.

          So, I intend to back-port DERBY-4679 to the 10.6 branch while I
          continue to look for a better solution that would:

          a) keep numbering consistent, if achievable
          b) document the invariants if I can establish any
          b) make the variant query i constructed for DERBY-4679 which still fails (the one that
          contains a subquery) also work.

          Show
          Dag H. Wanvik added a comment - Thanks for the encouragement, Bryan I'm afraid I am not quite yet in the position to replace the existing machinery with a better design, but I am slowly learning more about this area, too. Meanwhile, I feel more confident that the patch for DERBY-4679 (with the added followup-patch) would be safe to back-port from the following reasoning: the seen inconsistency lies in the columnNumber, not the tableNumber the (tn, cn)-based lookup used in the patch for DERBY-4679 only considers RCs with a correct table number (and column number). If the column number in the column reference should be wrong (as documented in this issue), the added sanity check for the column name to also match, would catch and discard the candidate RC, and make the (tn, cn)-based lookup return null. I.e. it can not return false positives. subsequently, we would fall back on the exisiting columnName based lookup (exisiting behavior) This leads me to conclude that the code would always be an improvement over the current solution. So, I intend to back-port DERBY-4679 to the 10.6 branch while I continue to look for a better solution that would: a) keep numbering consistent, if achievable b) document the invariants if I can establish any b) make the variant query i constructed for DERBY-4679 which still fails (the one that contains a subquery) also work.
          Hide
          Dag H. Wanvik added a comment - - edited

          Uploading a patch, derby-4679-fixcolumnno-1a, which fixes the wrong column numbers, by assigning them at the same time as the tableNumber is bound, i.e. in the fromTable classes, rather than in ColumnReference#bindExpression. The fact that JoinNode replaces the RC, will then not affect the actually bound columnNumber like before.

          Cf the enclosed trace patch which I have used to show what happens, trace-remapping.diff. This can be applied both with and without the patch proper.

          I have ported the original DERBY-2526 query over to JoinTest#testDerby_4695 and added one more query, so there are two queries, let's call them a and b.

          The query tree after the bind phase before and after are attached as query-a-before.log and query-a-after.log. There have been normalized, so they can be diffed to see the net difference in binding, which is that the column reference BVW.C1 was 1,6 before the patch and 1,2 after the patch.

          For query a, we see the following trace of remappings done as a result of join flattening before the patch:

          remapping CR: B3.C1 before: 0,1 after 5,1 null
          remapping CR: B3.C9 before: 0,2 after 5,2 null
          remapping CR: B3.C5 before: 0,3 after 5,3 null
          remapping CR: B3.C6 before: 0,4 after 5,4 null
          remapping CR: B4.C4 before: 6,2 after 6,2 null
          remapping CR: B3.C1 before: 0,1 after 0,1 null
          remapping CR: B3.C9 before: 0,2 after 0,2 null
          remapping CR: B3.C5 before: 0,3 after 0,3 null
          remapping CR: B3.C6 before: 0,4 after 0,4 null
          remapping CR: BVW.C1 before: 1,6 after 1,2 B2 <-------- wrong input, but ends up correct

          and the following after the patch:

          remapping CR: B3.C1 before: 0,1 after 5,1 null
          remapping CR: B3.C9 before: 0,2 after 5,2 null
          remapping CR: B3.C5 before: 0,3 after 5,3 null
          remapping CR: B3.C6 before: 0,4 after 5,4 null
          remapping CR: B4.C4 before: 6,2 after 6,2 null
          remapping CR: B3.C1 before: 0,1 after 0,1 null
          remapping CR: B3.C9 before: 0,2 after 0,2 null
          remapping CR: B3.C5 before: 0,3 after 0,3 null
          remapping CR: B3.C6 before: 0,4 after 0,4 null
          remapping CR: BVW.C1 before: 1,2 after 1,2 B2 <--- correct both before and after

          For query b, which is a more complicated join, I also upload similar log files, query-b-before.log and query-b-after.log, which also only differ in their column reference BVW.C1 (several this time, since the column in used in more places). The query looks like this:

          select b3.*, bvw.c1 from b3 inner join bvw on (b3.c1 = bvw.c5)
          inner join b4 on (bvw.c1 = b4.c7)
          inner join b on (bvw.c1 = b.c1)
          inner join b bcorr on bvw.c1 = bcorr.c1
          where b4.c4 = 42"

          which is a bit meaningless, but exercises several flattenings involving bvw.c1. This query also gave a correct answer, but as we can see below, during the intermediate steps, the binding of RC ends up wrong (also one a la DERBY-4679, due to falling back on column name based matching: since the (tablenumber, column number) pair was wrong at the outset, and so made the new lookup method introduced for DERBY-4679 fail to do its job. I believe, although I haven't yet been able to prove it, that the wrong numbering could lead to bugs:

          • wrong or missing transitive closure computation since this is based on (tn, cn)
          • lead to yet other wrong intermediate CRs (shown to happen thrice in the example below, although it doesn't lead to query error here) which could then in turn confuse transitive closure computation, as seen in DERBY-4679 (CR bound to wrong column instance in a JOIN RCL). Note that this represents another mode of "correct t-no, wrong col-no" happening: the primary issue here is the wrong binding in the bind phase, this second mode happens during rebinding in the optimizer's preprocess phase. I.e. error mode one leads to error mode 2, which has been shown to cause bugs.

          Now, let us look at the differences for query b. Without the patch we see:

          remapping CR: B3.C1 before: 0,1 after 9,1 null
          remapping CR: B3.C9 before: 0,2 after 9,2 null
          remapping CR: B3.C5 before: 0,3 after 9,3 null
          remapping CR: B3.C6 before: 0,4 after 9,4 null
          remapping CR: BVW.C1 before: 1,2 after 9,6 B2
          remapping CR: B4.C4 before: 6,2 after 9,11 null
          remapping CR: B3.C1 before: 0,1 after 7,1 null
          remapping CR: B3.C9 before: 0,2 after 7,2 null
          remapping CR: B3.C5 before: 0,3 after 7,3 null
          remapping CR: B3.C6 before: 0,4 after 7,4 null
          remapping CR: BVW.C1 before: 1,2 after 7,6 B2
          remapping CR: B4.C4 before: 6,2 after 7,11 null
          remapping CR: BVW.C1 before: 1,6 after 7,1 null <--- wrong!
          remapping CR: B3.C1 before: 0,1 after 5,1 null
          remapping CR: B3.C9 before: 0,2 after 5,2 null
          remapping CR: B3.C5 before: 0,3 after 5,3 null
          remapping CR: B3.C6 before: 0,4 after 5,4 null
          remapping CR: BVW.C1 before: 1,2 after 5,6 B2
          remapping CR: B4.C4 before: 6,2 after 6,2 null
          remapping CR: BVW.C1 before: 1,6 after 5,1 null <--- wrong
          remapping CR: BVW.C1 before: 1,6 after 5,1 null <--- wrong
          remapping CR: B3.C1 before: 0,1 after 0,1 null
          remapping CR: B3.C9 before: 0,2 after 0,2 null
          remapping CR: B3.C5 before: 0,3 after 0,3 null
          remapping CR: B3.C6 before: 0,4 after 0,4 null
          remapping CR: BVW.C1 before: 1,2 after 1,2 B2
          remapping CR: BVW.C1 before: 1,6 after 1,2 B2 <--- correct with wrong input (rescued ultimately!)
          remapping CR: BVW.C1 before: 1,6 after 1,2 B2 <--- ditto
          remapping CR: BVW.C1 before: 1,6 after 1,2 B2 <--- ditto

          vs after the patch:

          remapping CR: B3.C1 before: 0,1 after 9,1 null
          remapping CR: B3.C9 before: 0,2 after 9,2 null
          remapping CR: B3.C5 before: 0,3 after 9,3 null
          remapping CR: B3.C6 before: 0,4 after 9,4 null
          remapping CR: BVW.C1 before: 1,2 after 9,6 B2
          remapping CR: B4.C4 before: 6,2 after 9,11 null
          remapping CR: B3.C1 before: 0,1 after 7,1 null
          remapping CR: B3.C9 before: 0,2 after 7,2 null
          remapping CR: B3.C5 before: 0,3 after 7,3 null
          remapping CR: B3.C6 before: 0,4 after 7,4 null
          remapping CR: BVW.C1 before: 1,2 after 7,6 B2
          remapping CR: B4.C4 before: 6,2 after 7,11 null
          remapping CR: BVW.C1 before: 1,2 after 7,6 B2 <--- correct
          remapping CR: B3.C1 before: 0,1 after 5,1 null
          remapping CR: B3.C9 before: 0,2 after 5,2 null
          remapping CR: B3.C5 before: 0,3 after 5,3 null
          remapping CR: B3.C6 before: 0,4 after 5,4 null
          remapping CR: BVW.C1 before: 1,2 after 5,6 B2
          remapping CR: B4.C4 before: 6,2 after 6,2 null
          remapping CR: BVW.C1 before: 1,2 after 5,6 B2 <--- correct
          remapping CR: BVW.C1 before: 1,2 after 5,6 B2 <--- correct
          remapping CR: B3.C1 before: 0,1 after 0,1 null
          remapping CR: B3.C9 before: 0,2 after 0,2 null
          remapping CR: B3.C5 before: 0,3 after 0,3 null
          remapping CR: B3.C6 before: 0,4 after 0,4 null
          remapping CR: BVW.C1 before: 1,2 after 1,2 B2
          remapping CR: BVW.C1 before: 1,2 after 1,2 B2 <--- correct
          remapping CR: BVW.C1 before: 1,2 after 1,2 B2 <--- correct
          remapping CR: BVW.C1 before: 1,2 after 1,2 B2 <--- correct

          As can be seen, the fact that BVW.C1 on the join condition is now correctly bound, makes the fix for DERBY-4679 kick in in yet another instance, and return (7,6) rather than (7,1) after the second flattening.

          [ To see that (7,6) is correct, if we look inside the binding log file for query b, we see that table 7 corresponds to the join node with join clause BVW.C1 = B4.C7. This join has a RCL which is 12 columns long. We can see that RC (7,1) corresponds to B3.C1, which is wrong. After the patch, we see the binding is done to (9,6), which is correct, since that points to the correct RC in the RCL of table 7, that is a RC corresponding to RC in the join below, which ultimately stems from the subquery column BVW.C1. ]

          So, we see from the trace that for each flattening, the binding for BVW.C1 stays consistent during the flattening phases with the patch, whereas before it was wrong in several places, and also led to binding to the wrong RC in three cases, although the final flattening saved the day since the CR is then remapped against a RCL that had no duplicates in it - i.e. the subquery representing the view - so (1,6) finally became (1,2).

          The patch ran regressions ok. While I haven't yet been able to prove that this issue could lead to query errors, I do think it could, for the reasons cited above. I therefore recommend we commit this patch as a defensive measure. If people think I should first be able to construct a wrong query, I can always try again.

          Show
          Dag H. Wanvik added a comment - - edited Uploading a patch, derby-4679-fixcolumnno-1a, which fixes the wrong column numbers, by assigning them at the same time as the tableNumber is bound, i.e. in the fromTable classes, rather than in ColumnReference#bindExpression. The fact that JoinNode replaces the RC, will then not affect the actually bound columnNumber like before. Cf the enclosed trace patch which I have used to show what happens, trace-remapping.diff. This can be applied both with and without the patch proper. I have ported the original DERBY-2526 query over to JoinTest#testDerby_4695 and added one more query, so there are two queries, let's call them a and b. The query tree after the bind phase before and after are attached as query-a-before.log and query-a-after.log. There have been normalized, so they can be diffed to see the net difference in binding, which is that the column reference BVW.C1 was 1,6 before the patch and 1,2 after the patch. For query a, we see the following trace of remappings done as a result of join flattening before the patch: remapping CR: B3.C1 before: 0,1 after 5,1 null remapping CR: B3.C9 before: 0,2 after 5,2 null remapping CR: B3.C5 before: 0,3 after 5,3 null remapping CR: B3.C6 before: 0,4 after 5,4 null remapping CR: B4.C4 before: 6,2 after 6,2 null remapping CR: B3.C1 before: 0,1 after 0,1 null remapping CR: B3.C9 before: 0,2 after 0,2 null remapping CR: B3.C5 before: 0,3 after 0,3 null remapping CR: B3.C6 before: 0,4 after 0,4 null remapping CR: BVW.C1 before: 1,6 after 1,2 B2 <-------- wrong input, but ends up correct and the following after the patch: remapping CR: B3.C1 before: 0,1 after 5,1 null remapping CR: B3.C9 before: 0,2 after 5,2 null remapping CR: B3.C5 before: 0,3 after 5,3 null remapping CR: B3.C6 before: 0,4 after 5,4 null remapping CR: B4.C4 before: 6,2 after 6,2 null remapping CR: B3.C1 before: 0,1 after 0,1 null remapping CR: B3.C9 before: 0,2 after 0,2 null remapping CR: B3.C5 before: 0,3 after 0,3 null remapping CR: B3.C6 before: 0,4 after 0,4 null remapping CR: BVW.C1 before: 1,2 after 1,2 B2 <--- correct both before and after For query b, which is a more complicated join, I also upload similar log files, query-b-before.log and query-b-after.log, which also only differ in their column reference BVW.C1 (several this time, since the column in used in more places). The query looks like this: select b3.*, bvw.c1 from b3 inner join bvw on (b3.c1 = bvw.c5) inner join b4 on (bvw.c1 = b4.c7) inner join b on (bvw.c1 = b.c1) inner join b bcorr on bvw.c1 = bcorr.c1 where b4.c4 = 42" which is a bit meaningless, but exercises several flattenings involving bvw.c1. This query also gave a correct answer, but as we can see below, during the intermediate steps, the binding of RC ends up wrong (also one a la DERBY-4679 , due to falling back on column name based matching: since the (tablenumber, column number) pair was wrong at the outset, and so made the new lookup method introduced for DERBY-4679 fail to do its job. I believe, although I haven't yet been able to prove it, that the wrong numbering could lead to bugs: wrong or missing transitive closure computation since this is based on (tn, cn) lead to yet other wrong intermediate CRs (shown to happen thrice in the example below, although it doesn't lead to query error here) which could then in turn confuse transitive closure computation, as seen in DERBY-4679 (CR bound to wrong column instance in a JOIN RCL). Note that this represents another mode of "correct t-no, wrong col-no" happening: the primary issue here is the wrong binding in the bind phase, this second mode happens during rebinding in the optimizer's preprocess phase. I.e. error mode one leads to error mode 2, which has been shown to cause bugs. Now, let us look at the differences for query b. Without the patch we see: remapping CR: B3.C1 before: 0,1 after 9,1 null remapping CR: B3.C9 before: 0,2 after 9,2 null remapping CR: B3.C5 before: 0,3 after 9,3 null remapping CR: B3.C6 before: 0,4 after 9,4 null remapping CR: BVW.C1 before: 1,2 after 9,6 B2 remapping CR: B4.C4 before: 6,2 after 9,11 null remapping CR: B3.C1 before: 0,1 after 7,1 null remapping CR: B3.C9 before: 0,2 after 7,2 null remapping CR: B3.C5 before: 0,3 after 7,3 null remapping CR: B3.C6 before: 0,4 after 7,4 null remapping CR: BVW.C1 before: 1,2 after 7,6 B2 remapping CR: B4.C4 before: 6,2 after 7,11 null remapping CR: BVW.C1 before: 1,6 after 7,1 null <--- wrong! remapping CR: B3.C1 before: 0,1 after 5,1 null remapping CR: B3.C9 before: 0,2 after 5,2 null remapping CR: B3.C5 before: 0,3 after 5,3 null remapping CR: B3.C6 before: 0,4 after 5,4 null remapping CR: BVW.C1 before: 1,2 after 5,6 B2 remapping CR: B4.C4 before: 6,2 after 6,2 null remapping CR: BVW.C1 before: 1,6 after 5,1 null <--- wrong remapping CR: BVW.C1 before: 1,6 after 5,1 null <--- wrong remapping CR: B3.C1 before: 0,1 after 0,1 null remapping CR: B3.C9 before: 0,2 after 0,2 null remapping CR: B3.C5 before: 0,3 after 0,3 null remapping CR: B3.C6 before: 0,4 after 0,4 null remapping CR: BVW.C1 before: 1,2 after 1,2 B2 remapping CR: BVW.C1 before: 1,6 after 1,2 B2 <--- correct with wrong input (rescued ultimately!) remapping CR: BVW.C1 before: 1,6 after 1,2 B2 <--- ditto remapping CR: BVW.C1 before: 1,6 after 1,2 B2 <--- ditto vs after the patch: remapping CR: B3.C1 before: 0,1 after 9,1 null remapping CR: B3.C9 before: 0,2 after 9,2 null remapping CR: B3.C5 before: 0,3 after 9,3 null remapping CR: B3.C6 before: 0,4 after 9,4 null remapping CR: BVW.C1 before: 1,2 after 9,6 B2 remapping CR: B4.C4 before: 6,2 after 9,11 null remapping CR: B3.C1 before: 0,1 after 7,1 null remapping CR: B3.C9 before: 0,2 after 7,2 null remapping CR: B3.C5 before: 0,3 after 7,3 null remapping CR: B3.C6 before: 0,4 after 7,4 null remapping CR: BVW.C1 before: 1,2 after 7,6 B2 remapping CR: B4.C4 before: 6,2 after 7,11 null remapping CR: BVW.C1 before: 1,2 after 7,6 B2 <--- correct remapping CR: B3.C1 before: 0,1 after 5,1 null remapping CR: B3.C9 before: 0,2 after 5,2 null remapping CR: B3.C5 before: 0,3 after 5,3 null remapping CR: B3.C6 before: 0,4 after 5,4 null remapping CR: BVW.C1 before: 1,2 after 5,6 B2 remapping CR: B4.C4 before: 6,2 after 6,2 null remapping CR: BVW.C1 before: 1,2 after 5,6 B2 <--- correct remapping CR: BVW.C1 before: 1,2 after 5,6 B2 <--- correct remapping CR: B3.C1 before: 0,1 after 0,1 null remapping CR: B3.C9 before: 0,2 after 0,2 null remapping CR: B3.C5 before: 0,3 after 0,3 null remapping CR: B3.C6 before: 0,4 after 0,4 null remapping CR: BVW.C1 before: 1,2 after 1,2 B2 remapping CR: BVW.C1 before: 1,2 after 1,2 B2 <--- correct remapping CR: BVW.C1 before: 1,2 after 1,2 B2 <--- correct remapping CR: BVW.C1 before: 1,2 after 1,2 B2 <--- correct As can be seen, the fact that BVW.C1 on the join condition is now correctly bound, makes the fix for DERBY-4679 kick in in yet another instance, and return (7,6) rather than (7,1) after the second flattening. [ To see that (7,6) is correct, if we look inside the binding log file for query b, we see that table 7 corresponds to the join node with join clause BVW.C1 = B4.C7. This join has a RCL which is 12 columns long. We can see that RC (7,1) corresponds to B3.C1, which is wrong. After the patch, we see the binding is done to (9,6), which is correct, since that points to the correct RC in the RCL of table 7, that is a RC corresponding to RC in the join below, which ultimately stems from the subquery column BVW.C1. ] So, we see from the trace that for each flattening, the binding for BVW.C1 stays consistent during the flattening phases with the patch, whereas before it was wrong in several places, and also led to binding to the wrong RC in three cases, although the final flattening saved the day since the CR is then remapped against a RCL that had no duplicates in it - i.e. the subquery representing the view - so (1,6) finally became (1,2). The patch ran regressions ok. While I haven't yet been able to prove that this issue could lead to query errors, I do think it could, for the reasons cited above. I therefore recommend we commit this patch as a defensive measure. If people think I should first be able to construct a wrong query, I can always try again.
          Dag H. Wanvik made changes -
          Attachment derby-4695-fixcolumnno-1a.diff [ 12448232 ]
          Attachment derby-4695-fixcolumnno-1a.stat [ 12448233 ]
          Dag H. Wanvik made changes -
          Attachment query-a-before.log [ 12448234 ]
          Attachment query-a-after.log [ 12448235 ]
          Attachment query-b-before.log [ 12448236 ]
          Dag H. Wanvik made changes -
          Attachment query-b-after.log [ 12448237 ]
          Dag H. Wanvik made changes -
          Issue & fix info [Patch Available]
          Dag H. Wanvik made changes -
          Attachment trace-remapping.diff [ 12448239 ]
          Hide
          Dag H. Wanvik added a comment -

          Since this patch makes the intermediate (tableNumber, columnNumber) states (more) consistent and doesn't break anything as far as I can verify it, I propose to commit this patch soon unless somebody has objections.

          Show
          Dag H. Wanvik added a comment - Since this patch makes the intermediate (tableNumber, columnNumber) states (more) consistent and doesn't break anything as far as I can verify it, I propose to commit this patch soon unless somebody has objections.
          Hide
          Dag H. Wanvik added a comment -

          Uploading a slightly improved version of the patch, derby-4695-fixcolumnno-1b. Relative to 1a,
          I improved some comments and improved some code formatting, that's all.
          Regressions ran ok.

          Again, the corrected behavior can be seen by running the new test case, testDerby_4695 without and with the patch trace-remapping.diff applied.

          Show
          Dag H. Wanvik added a comment - Uploading a slightly improved version of the patch, derby-4695-fixcolumnno-1b. Relative to 1a, I improved some comments and improved some code formatting, that's all. Regressions ran ok. Again, the corrected behavior can be seen by running the new test case, testDerby_4695 without and with the patch trace-remapping.diff applied.
          Dag H. Wanvik made changes -
          Attachment derby-4695-fixcolumnno-1b.diff [ 12452822 ]
          Attachment derby-4695-fixcolumnno-1b.stat [ 12452823 ]
          Hide
          Dag H. Wanvik added a comment -

          Commited patch 1b at svn 988204, closing. I do not intend to backport this one.

          Show
          Dag H. Wanvik added a comment - Commited patch 1b at svn 988204, closing. I do not intend to backport this one.
          Dag H. Wanvik made changes -
          Status Open [ 1 ] Closed [ 6 ]
          Resolution Fixed [ 1 ]
          Dag H. Wanvik made changes -
          Fix Version/s 10.7.0.0 [ 12314971 ]
          Issue & fix info [Patch Available]
          Hide
          Dag H. Wanvik added a comment -

          Changing Issue type to "Improvement", since we haven't seen any bugs from this inconsistency.

          Show
          Dag H. Wanvik added a comment - Changing Issue type to "Improvement", since we haven't seen any bugs from this inconsistency.
          Dag H. Wanvik made changes -
          Issue Type Bug [ 1 ] Improvement [ 4 ]
          Rick Hillegas made changes -
          Affects Version/s 10.7.1.1 [ 12315564 ]
          Affects Version/s 10.7.1.0 [ 12314971 ]
          Fix Version/s 10.7.1.1 [ 12315564 ]
          Fix Version/s 10.7.1.0 [ 12314971 ]
          Dag H. Wanvik made changes -
          Link This issue relates to DERBY-5933 [ DERBY-5933 ]
          Gavin made changes -
          Workflow jira [ 12512785 ] Default workflow, editable Closed status [ 12801986 ]

            People

            • Assignee:
              Dag H. Wanvik
              Reporter:
              Dag H. Wanvik
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development