Derby
  1. Derby
  2. DERBY-4679

Several left outer joins causes unstable query with incorrect results

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.0.2.1, 10.1.1.0, 10.1.2.1, 10.1.3.1, 10.2.1.6, 10.2.2.0, 10.3.1.4, 10.3.2.1, 10.3.3.0, 10.4.1.3, 10.4.2.0, 10.5.1.1, 10.5.2.0, 10.5.3.0, 10.6.1.0
    • Fix Version/s: 10.5.3.2, 10.6.2.1, 10.7.1.1
    • Component/s: SQL
    • Labels:
      None
    • Environment:
    • Issue & fix info:
      Repro attached
    • Bug behavior facts:
      Deviation from standard, Seen in production, Wrong query result

      Description

      select distinct
      t1.ITEM_ID,
      t1.STATE_ID,
      t1.JZ_DISCRIMINATOR
      from (
      select *
      from
      LM.ABSTRACT_INSTANCE z1
      where
      z1.JZ_DISCRIMINATOR = 238
      ) t1 left outer join
      LM.LAB_RESOURCE_OPERATINGSYSTEM j1 on (t1.ITEM_ID = j1.JZ_PARENT_ID) left outer join
      LM.ABSTRACT_INSTANCE t2 on (j1.ITEM_ID = t2.ITEM_ID) left outer join
      LM.OPERATING_SYSTEM_SOFTWARE_INSTALL j2 on (t2.ITEM_ID = j2.JZ_PARENT_ID) left outer join
      LM.ABSTRACT_INSTANCE t3 on (j2.ITEM_ID = t3.ITEM_ID) left outer join
      LM.FAMILY t5 on (t2.FAMILY_ITEM_ID = t5.ITEM_ID) left outer join
      LM.FAMILY t7 on (t1.FAMILY_ITEM_ID = t7.ITEM_ID)
      where (
      t3.FAMILY_ITEM_ID in('_5VetVWTeEd-Q8aOqWJPEIQ') and
      (t5.ROOT_ITEM_ID = '_5ZDlwWTeEd-Q8aOqWJPEIQ') and
      (t7.ROOT_ITEM_ID = '_5nN9mmTeEd-Q8aOqWJPEIQ') and
      (t1.VISIBILITY = 0)
      )

      The above query returns no results despite the fact that the database contains results that match the query.
      Slight modifications to the query that shouldn't change the outcome cause it to return the expected results. For example: changing
      "t3.FAMILY_ITEM_ID in('_5VetVWTeEd-Q8aOqWJPEIQ')"
      to
      "t3.FAMILY_ITEM_ID in('_5VetVWTeEd-Q8aOqWJPEIQ', 'blah')"
      or
      "t3.FAMILY_ITEM_ID = '_5VetVWTeEd-Q8aOqWJPEIQ'"

      or removing
      "(t5.ROOT_ITEM_ID = '_5ZDlwWTeEd-Q8aOqWJPEIQ') and "
      despite the fact that the results show that t5.ROOT_ITEM_ID is equal to the string above.

      Note that there is no error message associated with the incorrect query, it just returns zero rows.

      The query below doesn't use join statements and works as expected.

      select distinct
      t1.item_id,
      t1.state_id,
      t1.jz_discriminator
      from (
      select *
      from
      lm.abstract_instance z1
      where
      z1.jz_discriminator = 238
      ) t1,
      lm.lab_resource_operatingsystem j1,
      lm.abstract_instance t3,
      lm.operating_system_software_install j2,
      lm.abstract_instance t2,
      lm.family t5,
      lm.family t7
      where
      t1.item_id = j1.jz_parent_id and
      j1.item_id = t2.item_id and
      t2.item_id = j2.jz_parent_id and
      j2.item_id = t3.item_id and
      t2.family_item_id = t5.item_id and
      t1.family_item_id = t7.item_id and
      t3.family_item_id in ('_5VetVWTeEd-Q8aOqWJPEIQ') and
      t5.root_item_id = '_5ZDlwWTeEd-Q8aOqWJPEIQ' and
      t7.root_item_id = '_5nN9mmTeEd-Q8aOqWJPEIQ' and
      t1.visibility = 0;

      1. derby-4679-2a.stat
        0.2 kB
        Dag H. Wanvik
      2. derby-4679-2a.diff
        14 kB
        Dag H. Wanvik
      3. derby-4679-followup.stat
        0.1 kB
        Dag H. Wanvik
      4. derby-4679-followup.diff
        4 kB
        Dag H. Wanvik
      5. derby-4679b.stat
        0.3 kB
        Dag H. Wanvik
      6. derby-4679b.diff
        12 kB
        Dag H. Wanvik
      7. derby-4679a.stat
        0.3 kB
        Dag H. Wanvik
      8. derby-4679a.diff
        12 kB
        Dag H. Wanvik
      9. drawing.txt
        7 kB
        Dag H. Wanvik
      10. drawing.txt
        7 kB
        Dag H. Wanvik
      11. Foo.java
        2 kB
        Dag H. Wanvik
      12. show.diff
        5 kB
        Dag H. Wanvik
      13. equal-after-preprocess.log
        189 kB
        Dag H. Wanvik
      14. ins-after-preprocess.log
        244 kB
        Dag H. Wanvik
      15. in-after-preprocess.log
        242 kB
        Dag H. Wanvik
      16. trace-in.log
        1 kB
        Dag H. Wanvik
      17. trace-equal.log
        3 kB
        Dag H. Wanvik
      18. trace-exec-plan.log
        15 kB
        Dag H. Wanvik
      19. sample.sql
        3 kB
        David Richards
      20. db.sql
        4 kB
        David Richards

        Issue Links

          Activity

          Hide
          Knut Anders Hatlen added a comment -

          Hi David,

          It would be helpful if you could post the schema for the tables involved in the query, including all constraints and indexes defined on them. Preferably also with enough rows to reproduce the problem.
          (The dblook utility could be used to extract schema information. See http://db.apache.org/derby/docs/10.6/tools/ctoolsdblook.html.)

          There's at least one wrong query result bug affecting joins (DERBY-4372) in 10.5.3.0 that has been fixed in the recently released 10.6.1.0, so it might be worthwhile to check if the bug is still present in the latest version of Derby.

          Show
          Knut Anders Hatlen added a comment - Hi David, It would be helpful if you could post the schema for the tables involved in the query, including all constraints and indexes defined on them. Preferably also with enough rows to reproduce the problem. (The dblook utility could be used to extract schema information. See http://db.apache.org/derby/docs/10.6/tools/ctoolsdblook.html .) There's at least one wrong query result bug affecting joins ( DERBY-4372 ) in 10.5.3.0 that has been fixed in the recently released 10.6.1.0, so it might be worthwhile to check if the bug is still present in the latest version of Derby.
          Hide
          David Richards added a comment -

          I've attached the output of dblook as db.sql and a simplified creation script as sample.sql that includes the relevant tables and columns with sample data. I also verified that the bug still occurs in 10.6.1.0

          Show
          David Richards added a comment - I've attached the output of dblook as db.sql and a simplified creation script as sample.sql that includes the relevant tables and columns with sample data. I also verified that the bug still occurs in 10.6.1.0
          Hide
          Knut Anders Hatlen added a comment -

          Thanks David. I'm able to reproduce it with the scripts you attached.

          I'm a little puzzled by the fact that rewriting "t3.FAMILY_ITEM_ID in('_5VetVWTeEd-Q8aOqWJPEIQ')" to "t3.FAMILY_ITEM_ID = '_5VetVWTeEd-Q8aOqWJPEIQ'" changes the result, since Derby actually does that internally before the statement is compiled.

          Show
          Knut Anders Hatlen added a comment - Thanks David. I'm able to reproduce it with the scripts you attached. I'm a little puzzled by the fact that rewriting "t3.FAMILY_ITEM_ID in('_5VetVWTeEd-Q8aOqWJPEIQ')" to "t3.FAMILY_ITEM_ID = '_5VetVWTeEd-Q8aOqWJPEIQ'" changes the result, since Derby actually does that internally before the statement is compiled.
          Hide
          Dag H. Wanvik added a comment - - edited

          With this modified query (inner join t3)

          select distinct t1.ITEM_ID, t1.state_id, t1.JZ_DISCRIMINATOR from
          (select * from LM.ABSTRACT_INSTANCE z1 where z1.JZ_DISCRIMINATOR = 238) t1 left outer join
          LM.LAB_RESOURCE_OPERATINGSYSTEM j1 on (t1.ITEM_ID = j1.JZ_PARENT_ID) left outer join
          LM.ABSTRACT_INSTANCE t2 on (j1.ITEM_ID = t2.ITEM_ID) left outer join
          LM.OPERATING_SYSTEM_SOFTWARE_INSTALL j2 on (t2.ITEM_ID = j2.JZ_PARENT_ID) inner join
          LM.ABSTRACT_INSTANCE t3 on (j2.ITEM_ID = t3.ITEM_ID) left outer join
          LM.FAMILY t5 on (t2.FAMILY_ITEM_ID = t5.ITEM_ID) left outer join
          LM.FAMILY t7 on (t1.FAMILY_ITEM_ID = t7.ITEM_ID) where
          (t3.FAMILY_ITEM_ID in('_5VetVWTeEd-Q8aOqWJPEIQ') and
          (t5.ROOT_ITEM_ID = '_5ZDlwWTeEd-Q8aOqWJPEIQ') and
          (t7.ROOT_ITEM_ID = '_5nN9mmTeEd-Q8aOqWJPEIQ'));

          the query works also with the IN formulation. With the equality
          formulation, which was seen to work, the left outer join is
          internally rewritten to inner join. Since the single value IN is rewritten to
          equality, Derby could have done that rewrite internally also for that
          case, and the problem would presumably have gone away. I don't this
          this is (necessarily) the root problem, though.

          With multiple values in the IN clause, this rewrite of the JOIN does
          not happen, however, but that formulation also works. I tried to
          disable temporarily the rewrite to equality for the single value case
          (InListOperatorNode#preprocess), so as to use the multi-probing
          strategy, but the query still fails. (edited: actually it works, finger trouble, see next comment)

          Show
          Dag H. Wanvik added a comment - - edited With this modified query (inner join t3) select distinct t1.ITEM_ID, t1.state_id, t1.JZ_DISCRIMINATOR from (select * from LM.ABSTRACT_INSTANCE z1 where z1.JZ_DISCRIMINATOR = 238) t1 left outer join LM.LAB_RESOURCE_OPERATINGSYSTEM j1 on (t1.ITEM_ID = j1.JZ_PARENT_ID) left outer join LM.ABSTRACT_INSTANCE t2 on (j1.ITEM_ID = t2.ITEM_ID) left outer join LM.OPERATING_SYSTEM_SOFTWARE_INSTALL j2 on (t2.ITEM_ID = j2.JZ_PARENT_ID) inner join LM.ABSTRACT_INSTANCE t3 on (j2.ITEM_ID = t3.ITEM_ID) left outer join LM.FAMILY t5 on (t2.FAMILY_ITEM_ID = t5.ITEM_ID) left outer join LM.FAMILY t7 on (t1.FAMILY_ITEM_ID = t7.ITEM_ID) where (t3.FAMILY_ITEM_ID in('_5VetVWTeEd-Q8aOqWJPEIQ') and (t5.ROOT_ITEM_ID = '_5ZDlwWTeEd-Q8aOqWJPEIQ') and (t7.ROOT_ITEM_ID = '_5nN9mmTeEd-Q8aOqWJPEIQ')); the query works also with the IN formulation. With the equality formulation, which was seen to work, the left outer join is internally rewritten to inner join. Since the single value IN is rewritten to equality, Derby could have done that rewrite internally also for that case, and the problem would presumably have gone away. I don't this this is (necessarily) the root problem, though. With multiple values in the IN clause, this rewrite of the JOIN does not happen, however, but that formulation also works. I tried to disable temporarily the rewrite to equality for the single value case (InListOperatorNode#preprocess), so as to use the multi-probing strategy, but the query still fails. (edited: actually it works, finger trouble, see next comment)
          Hide
          Dag H. Wanvik added a comment -

          As far as trying the multi-probing case, I was foiled by the check for same min and max value, which also short circuits to equality (InListOperatorNode, ca line 282). By also commenting out that section, thus forcing the probing strategy, the query worked.
          So, it's the transformation to equality in this context which makes it fail somehow.

          Show
          Dag H. Wanvik added a comment - As far as trying the multi-probing case, I was foiled by the check for same min and max value, which also short circuits to equality (InListOperatorNode, ca line 282). By also commenting out that section, thus forcing the probing strategy, the query worked. So, it's the transformation to equality in this context which makes it fail somehow.
          Hide
          Dag H. Wanvik added a comment -

          See this all the way back to 10.0, so not a regression.

          Show
          Dag H. Wanvik added a comment - See this all the way back to 10.0, so not a regression.
          Hide
          Dag H. Wanvik added a comment - - edited

          I enabled the calls of traceScanParameters in TableScanResultSet for the IN formulation and the equal formulation, uploaded as
          trace-equal.log and trace-in.log. As can be seen, for the IN case, there are two qualifiers being applied to the table FAMILY,
          whereas in the equal case there is only one, column id 1 (ITEM_ID), which is reasonable. In the IN case, there is also pushed down a qualifier on
          column id 0 (STATE_ID) which seems wrong. Presumably this is why no rows qualify from FAMILY and hence leads to now rows in all result sets (see enclosed
          execution plan, trace-exec-plan.log).

          – Edited: The columns mentioned above for FAMILY should be 0: ITEM_ID, 1:ROOT_ITEM_ID. Later investigation shows that the qualifier on column 0 is wrong, see below. (STATE_ID is in another table, sorry for the confusion).

          Show
          Dag H. Wanvik added a comment - - edited I enabled the calls of traceScanParameters in TableScanResultSet for the IN formulation and the equal formulation, uploaded as trace-equal.log and trace-in.log. As can be seen, for the IN case, there are two qualifiers being applied to the table FAMILY, whereas in the equal case there is only one, column id 1 (ITEM_ID), which is reasonable. In the IN case, there is also pushed down a qualifier on column id 0 (STATE_ID) which seems wrong. Presumably this is why no rows qualify from FAMILY and hence leads to now rows in all result sets (see enclosed execution plan, trace-exec-plan.log). – Edited: The columns mentioned above for FAMILY should be 0: ITEM_ID, 1:ROOT_ITEM_ID. Later investigation shows that the qualifier on column 0 is wrong, see below. (STATE_ID is in another table, sorry for the confusion).
          Hide
          Dag H. Wanvik added a comment -

          Uploading the AST dump just after DMLStatementNode#optimizeStatement has called preprocess:

          • in-after-preprocess.log (wrong result)
          • ins-after-preprocess.log (correct result)
          • equal-after-preprocess.log (correct result)

          I have normalized the node hash numbers for ease of comparison to be ascending integers.

          Show
          Dag H. Wanvik added a comment - Uploading the AST dump just after DMLStatementNode#optimizeStatement has called preprocess: in-after-preprocess.log (wrong result) ins-after-preprocess.log (correct result) equal-after-preprocess.log (correct result) I have normalized the node hash numbers for ease of comparison to be ascending integers.
          Hide
          Dag H. Wanvik added a comment -

          Looking at in-after-preprocess.log, line 4551, I see a restriction predicate on T7 (LM.FAMILY):

          T7.ITEM_ID = CharConstantNode@331['_5VetVWTeEd-Q8aOqWJPEIQ'].

          This is wrong, because this constant is only ever used in the query against table T3 (LM.ABSTRACT_INSTANCE),
          and then against column FAMILY_ITEM_ID. This is the cause of the wrong qualifier which leads to 0 rows from FAMILY as seen.

          On the other hand, in ins-after-preprocess (correct result), we see two predicates in the final where clause (because the last two joins have been flattened, 4
          halfouterjoinnodes remain:

          a) t1.FAMILY_ITEM_ID = T7.ITEM_ID
          b) t2.FAMILY_ITEM_ID = T5.ITEM_ID

          For the single IN case (erroneous), we also have only 4 remaining halfouterjoinnodes, but the b) predicate in the final where clause is missing.

          In the equal-after-preprocess.log we see only three halfouterjoinnodes, since 3 joins have been flattened, and the final where predicate contains
          a) and b) plus a new

          c) J2.ITEM_ID = T3.ITEM_ID

          which corresponds to the third but the last join, which have been flattened because of the equality.
          In the erroneous case, the left outer join was not converted to an inner join since it did not contain an = operator at the time of LOJ->IJ conversion, so flattening did not happen
          and we are left with 4 halfouterjoinnodes. That is less than optimal, but should still work.

          Somehow the preprocessing (all is fine after bind) of the simple IN case has gone wrong.

          Show
          Dag H. Wanvik added a comment - Looking at in-after-preprocess.log, line 4551, I see a restriction predicate on T7 (LM.FAMILY): T7.ITEM_ID = CharConstantNode@331 ['_5VetVWTeEd-Q8aOqWJPEIQ'] . This is wrong, because this constant is only ever used in the query against table T3 (LM.ABSTRACT_INSTANCE), and then against column FAMILY_ITEM_ID. This is the cause of the wrong qualifier which leads to 0 rows from FAMILY as seen. On the other hand, in ins-after-preprocess (correct result), we see two predicates in the final where clause (because the last two joins have been flattened, 4 halfouterjoinnodes remain: a) t1.FAMILY_ITEM_ID = T7.ITEM_ID b) t2.FAMILY_ITEM_ID = T5.ITEM_ID For the single IN case (erroneous), we also have only 4 remaining halfouterjoinnodes, but the b) predicate in the final where clause is missing. In the equal-after-preprocess.log we see only three halfouterjoinnodes, since 3 joins have been flattened, and the final where predicate contains a) and b) plus a new c) J2.ITEM_ID = T3.ITEM_ID which corresponds to the third but the last join, which have been flattened because of the equality. In the erroneous case, the left outer join was not converted to an inner join since it did not contain an = operator at the time of LOJ->IJ conversion, so flattening did not happen and we are left with 4 halfouterjoinnodes. That is less than optimal, but should still work. Somehow the preprocessing (all is fine after bind) of the simple IN case has gone wrong.
          Hide
          Dag H. Wanvik added a comment -

          Two data points:

          • Disabling LOJ reordeable has no effect, so it is probably not related to DERBY-4471
          • Disabling the call to transforOuterJoins in FromList#preprocess makes the query work
          Show
          Dag H. Wanvik added a comment - Two data points: Disabling LOJ reordeable has no effect, so it is probably not related to DERBY-4471 Disabling the call to transforOuterJoins in FromList#preprocess makes the query work
          Hide
          Dag H. Wanvik added a comment - - edited

          While keeping the call to transforOuterJoins in FromList#preprocess disabled, I manage to get this version of the query to fail as well. The predicate on t3.FAMILY_ITEM_ID uses explicit equality here, but it not lead to rewrite to inner join of the LOJ with t3 in this case since I disabled transforOuterJoins.

          "select distinct t1.ITEM_ID, t1.state_id, t1.JZ_DISCRIMINATOR from
          (((((((select * from LM.ABSTRACT_INSTANCE z1 where z1.JZ_DISCRIMINATOR = 238) t1
          left outer join LM.LAB_RESOURCE_OPERATINGSYSTEM j1 on (t1.ITEM_ID = j1.JZ_PARENT_ID))
          left outer join LM.ABSTRACT_INSTANCE t2 on (j1.ITEM_ID = t2.ITEM_ID))
          left outer join LM.OPERATING_SYSTEM_SOFTWARE_INSTALL j2 on (t2.ITEM_ID = j2.JZ_PARENT_ID))
          left outer join LM.ABSTRACT_INSTANCE t3 on (j2.ITEM_ID = t3.ITEM_ID))
          inner join LM.FAMILY t5 on (t2.FAMILY_ITEM_ID = t5.ITEM_ID))
          inner join LM.FAMILY t7 on (t1.FAMILY_ITEM_ID = t7.ITEM_ID))
          where (t3.FAMILY_ITEM_ID ='_5VetVWTeEd-Q8aOqWJPEIQ' and <------------ explicit "=" also fails when LOJ -> IJ is disabled
          (t5.ROOT_ITEM_ID = '_5ZDlwWTeEd-Q8aOqWJPEIQ') and
          (t7.ROOT_ITEM_ID ='_5nN9mmTeEd-Q8aOqWJPEIQ'))"

          The LOJ->IJ rewrite is not logically required and the bug is now seen to be present without the LOJ->IJ rewriting happeing.

          In the above query, we have four remaining LOJs. This is also what resulted in the original case after IN is transformed to '=", since that happens too late for LOJ->IJ to kick in for that case. It does happen for the two last joins, as manually replicated in the above experiment.

          I see that the reason the predicate "= _5VetVWTeEd-Q8aOqWJPEIQ" ends up on T7.ITEM_ID is because of the transitive closure computation in PredicatList#searchClauseTransitiveClosure.

          Could it be that the precondition for that is wrong here: a predicate on t3 which has not been flattened is applied transitively to the T7, which has been?

          Show
          Dag H. Wanvik added a comment - - edited While keeping the call to transforOuterJoins in FromList#preprocess disabled, I manage to get this version of the query to fail as well. The predicate on t3.FAMILY_ITEM_ID uses explicit equality here, but it not lead to rewrite to inner join of the LOJ with t3 in this case since I disabled transforOuterJoins. "select distinct t1.ITEM_ID, t1.state_id, t1.JZ_DISCRIMINATOR from (((((((select * from LM.ABSTRACT_INSTANCE z1 where z1.JZ_DISCRIMINATOR = 238) t1 left outer join LM.LAB_RESOURCE_OPERATINGSYSTEM j1 on (t1.ITEM_ID = j1.JZ_PARENT_ID)) left outer join LM.ABSTRACT_INSTANCE t2 on (j1.ITEM_ID = t2.ITEM_ID)) left outer join LM.OPERATING_SYSTEM_SOFTWARE_INSTALL j2 on (t2.ITEM_ID = j2.JZ_PARENT_ID)) left outer join LM.ABSTRACT_INSTANCE t3 on (j2.ITEM_ID = t3.ITEM_ID)) inner join LM.FAMILY t5 on (t2.FAMILY_ITEM_ID = t5.ITEM_ID)) inner join LM.FAMILY t7 on (t1.FAMILY_ITEM_ID = t7.ITEM_ID)) where (t3.FAMILY_ITEM_ID ='_5VetVWTeEd-Q8aOqWJPEIQ' and <------------ explicit "=" also fails when LOJ -> IJ is disabled (t5.ROOT_ITEM_ID = '_5ZDlwWTeEd-Q8aOqWJPEIQ') and (t7.ROOT_ITEM_ID ='_5nN9mmTeEd-Q8aOqWJPEIQ'))" The LOJ->IJ rewrite is not logically required and the bug is now seen to be present without the LOJ->IJ rewriting happeing. In the above query, we have four remaining LOJs. This is also what resulted in the original case after IN is transformed to '=", since that happens too late for LOJ->IJ to kick in for that case. It does happen for the two last joins, as manually replicated in the above experiment. I see that the reason the predicate "= _5VetVWTeEd-Q8aOqWJPEIQ" ends up on T7.ITEM_ID is because of the transitive closure computation in PredicatList#searchClauseTransitiveClosure. Could it be that the precondition for that is wrong here: a predicate on t3 which has not been flattened is applied transitively to the T7, which has been?
          Hide
          Dag H. Wanvik added a comment -

          I am starting to convince myself it is plain wrong to apply the transitive closure to predicates on tables not from the outer equi-joins, cf
          the explanation in the Sybase manual:

          http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc00743.1502/html/queryprocessing/queryprocessing8.htm

          (quote):
          (predicate) Transitive closure is used only for normal equi-joins, as shown. join transitive closure is not performed for:

          • Non-equi-joins; for example, t1.c1 > t2.c2
          • Outer joins; for example, t1.c11 *= t2.c2, or left join or right join (<------------ NOTE! -Dag's emphasis)
          • Joins across subquery boundaries
          • Joins used to check referential integrity or the with check option on views
            (quote end)

          as well as the Derby docs which explains its rationale here:

          http://db.apache.org/derby/docs/10.6/tuning/rtuntransform866587.html

          Show
          Dag H. Wanvik added a comment - I am starting to convince myself it is plain wrong to apply the transitive closure to predicates on tables not from the outer equi-joins, cf the explanation in the Sybase manual: http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc00743.1502/html/queryprocessing/queryprocessing8.htm (quote): (predicate) Transitive closure is used only for normal equi-joins, as shown. join transitive closure is not performed for: Non-equi-joins; for example, t1.c1 > t2.c2 Outer joins; for example, t1.c11 *= t2.c2, or left join or right join (<------------ NOTE! -Dag's emphasis) Joins across subquery boundaries Joins used to check referential integrity or the with check option on views (quote end) as well as the Derby docs which explains its rationale here: http://db.apache.org/derby/docs/10.6/tuning/rtuntransform866587.html
          Hide
          Dag H. Wanvik added a comment - - edited

          I see that in the table number for t3 in the predicate "t3.FAMILY_ITEM_ID ='_5VetVWTeEd-Q8aOqWJPEIQ'" is 9.
          However, so is also the table number for t1 in the equi-join clause "t1.FAMILY_ITEM_ID = t7.ITEM_ID)".

          This is the immediate reason why the bad predicate is added by the searchClauseTransitiveClosure logic, t3 is assumed to be the same as t1.

          t1 and t3 both have the same underlying base table, i.e. LM.ABSTRACT_INSTANCE.

          Show
          Dag H. Wanvik added a comment - - edited I see that in the table number for t3 in the predicate "t3.FAMILY_ITEM_ID ='_5VetVWTeEd-Q8aOqWJPEIQ'" is 9. However, so is also the table number for t1 in the equi-join clause "t1.FAMILY_ITEM_ID = t7.ITEM_ID)". This is the immediate reason why the bad predicate is added by the searchClauseTransitiveClosure logic, t3 is assumed to be the same as t1. t1 and t3 both have the same underlying base table, i.e. LM.ABSTRACT_INSTANCE.
          Hide
          Dag H. Wanvik added a comment -

          I now believe this is a bug related to DERBY-2526 and DERBY-3023.
          When equi-joins are flattened (JoinNode#flatten), the column references are remapped, cf. calls to
          remapColumnReferencesToExpressions.

          DERBY-2526 uncovered a weakness in this remapping in that only table
          number were updated, not column numbers. The fix to DERBY-2526
          contains this line in
          ColumnReference#remapColumnReferencesToExpressions:

          ResultColumn ftRC =
          ft.getResultColumns().getResultColumn(columnName);

          Unfortunately, in this query, using the columnName will not uniquely
          identify the correct result column, since the multi-way's concatenated
          list of result columns (cf. the write-up attached to DERBY-2526:
          https://issues.apache.org/jira/secure/attachment/12358291/d2526_v1.html)
          contains several columns with the same names, resulting from the fact
          that the query contains self-joins on ABSTRACT_INSTANCE): z1/t1, t2
          and t3.

          To see this happening, apply the patch show.diff and run the attached
          program Foo.java. I believe this causes one or more of remappings to
          go awry. The immediate problem is that the remapping of
          t3.FAMILY_ITEM_ID chooses the first occurence of FAMILY_ITEM_ID and
          ands up with the occurence for z1/t1, which leads to the fatal mixup.

          Output seen:


          map before: T1.ITEM_ID [null] (tablenum, colnum) = (0, 2) VCid=2
          Found 6 columns of the name ITEM_ID
          map after: T1.ITEM_ID [null] (tablenum, colnum) = (11, 2) VCid=2 map from VirtualColumnId
          map before: T1.STATE_ID [null] (tablenum, colnum) = (0, 4) VCid=4
          Found 3 columns of the name STATE_ID
          map after: T1.STATE_ID [null] (tablenum, colnum) = (11, 4) VCid=4 map from VirtualColumnId
          map before: T1.JZ_DISCRIMINATOR [null] (tablenum, colnum) = (0, 1) VCid=1
          Found 3 columns of the name JZ_DISCRIMINATOR
          map after: T1.JZ_DISCRIMINATOR [null] (tablenum, colnum) = (11, 1) VCid=1 map from VirtualColumnId
          map before: T3.FAMILY_ITEM_ID [null] (tablenum, colnum) = (8, 3) VCid=17
          Found 3 columns of the name FAMILY_ITEM_ID
          map after: T3.FAMILY_ITEM_ID [null] (tablenum, colnum) = (11, 3) VCid=17 map from VirtualColumnId
          map before: T5.ROOT_ITEM_ID [null] (tablenum, colnum) = (10, 2) VCid=21
          map after: T5.ROOT_ITEM_ID [null] (tablenum, colnum) = (11, 21) VCid=21 map from VirtualColumnId
          map before: T7.ROOT_ITEM_ID [T7] (tablenum, colnum) = (12, 2) VCid=23
          map after: T7.ROOT_ITEM_ID [T7] (tablenum, colnum) = (12, 2) VCid=23 map from ColumnPosition
          map before: T1.VISIBILITY [null] (tablenum, colnum) = (0, 5) VCid=5
          Found 3 columns of the name VISIBILITY
          map after: T1.VISIBILITY [null] (tablenum, colnum) = (11, 5) VCid=5 map from VirtualColumnId


          map before: T1.ITEM_ID [null] (tablenum, colnum) = (11, 2) VCid=2
          Found 5 columns of the name ITEM_ID
          map after: T1.ITEM_ID [null] (tablenum, colnum) = (9, 2) VCid=2 map from VirtualColumnId
          map before: T1.STATE_ID [null] (tablenum, colnum) = (11, 4) VCid=4
          map after: T1.STATE_ID [null] (tablenum, colnum) = (9, 4) VCid=4 map from VirtualColumnId
          map before: T1.JZ_DISCRIMINATOR [null] (tablenum, colnum) = (11, 1) VCid=1
          map after: T1.JZ_DISCRIMINATOR [null] (tablenum, colnum) = (9, 1) VCid=1 map from VirtualColumnId
          map before: T3.FAMILY_ITEM_ID [null] (tablenum, colnum) = (11, 3) VCid=17
          Found 3 columns of the name FAMILY_ITEM_ID
          map after: T3.FAMILY_ITEM_ID [null] (tablenum, colnum) = (9, 3) VCid=17 map from VirtualColumnId
          map before: T5.ROOT_ITEM_ID [T5] (tablenum, colnum) = (11, 21) VCid=21
          map after: T5.ROOT_ITEM_ID [T5] (tablenum, colnum) = (10, 2) VCid=21 map from ColumnPosition
          map before: T1.VISIBILITY [null] (tablenum, colnum) = (11, 5) VCid=5
          map after: T1.VISIBILITY [null] (tablenum, colnum) = (9, 5) VCid=5 map from VirtualColumnId
          map before: T1.FAMILY_ITEM_ID [null] (tablenum, colnum) = (0, 3) VCid=3
          Found 3 columns of the name FAMILY_ITEM_ID
          map after: T1.FAMILY_ITEM_ID [null] (tablenum, colnum) = (9, 3) VCid=3 map from VirtualColumnId


          Show
          Dag H. Wanvik added a comment - I now believe this is a bug related to DERBY-2526 and DERBY-3023 . When equi-joins are flattened (JoinNode#flatten), the column references are remapped, cf. calls to remapColumnReferencesToExpressions. DERBY-2526 uncovered a weakness in this remapping in that only table number were updated, not column numbers. The fix to DERBY-2526 contains this line in ColumnReference#remapColumnReferencesToExpressions: ResultColumn ftRC = ft.getResultColumns().getResultColumn(columnName); Unfortunately, in this query, using the columnName will not uniquely identify the correct result column, since the multi-way's concatenated list of result columns (cf. the write-up attached to DERBY-2526 : https://issues.apache.org/jira/secure/attachment/12358291/d2526_v1.html ) contains several columns with the same names, resulting from the fact that the query contains self-joins on ABSTRACT_INSTANCE): z1/t1, t2 and t3. To see this happening, apply the patch show.diff and run the attached program Foo.java. I believe this causes one or more of remappings to go awry. The immediate problem is that the remapping of t3.FAMILY_ITEM_ID chooses the first occurence of FAMILY_ITEM_ID and ands up with the occurence for z1/t1, which leads to the fatal mixup. Output seen: map before: T1.ITEM_ID [null] (tablenum, colnum) = (0, 2) VCid=2 Found 6 columns of the name ITEM_ID map after: T1.ITEM_ID [null] (tablenum, colnum) = (11, 2) VCid=2 map from VirtualColumnId map before: T1.STATE_ID [null] (tablenum, colnum) = (0, 4) VCid=4 Found 3 columns of the name STATE_ID map after: T1.STATE_ID [null] (tablenum, colnum) = (11, 4) VCid=4 map from VirtualColumnId map before: T1.JZ_DISCRIMINATOR [null] (tablenum, colnum) = (0, 1) VCid=1 Found 3 columns of the name JZ_DISCRIMINATOR map after: T1.JZ_DISCRIMINATOR [null] (tablenum, colnum) = (11, 1) VCid=1 map from VirtualColumnId map before: T3.FAMILY_ITEM_ID [null] (tablenum, colnum) = (8, 3) VCid=17 Found 3 columns of the name FAMILY_ITEM_ID map after: T3.FAMILY_ITEM_ID [null] (tablenum, colnum) = (11, 3) VCid=17 map from VirtualColumnId map before: T5.ROOT_ITEM_ID [null] (tablenum, colnum) = (10, 2) VCid=21 map after: T5.ROOT_ITEM_ID [null] (tablenum, colnum) = (11, 21) VCid=21 map from VirtualColumnId map before: T7.ROOT_ITEM_ID [T7] (tablenum, colnum) = (12, 2) VCid=23 map after: T7.ROOT_ITEM_ID [T7] (tablenum, colnum) = (12, 2) VCid=23 map from ColumnPosition map before: T1.VISIBILITY [null] (tablenum, colnum) = (0, 5) VCid=5 Found 3 columns of the name VISIBILITY map after: T1.VISIBILITY [null] (tablenum, colnum) = (11, 5) VCid=5 map from VirtualColumnId map before: T1.ITEM_ID [null] (tablenum, colnum) = (11, 2) VCid=2 Found 5 columns of the name ITEM_ID map after: T1.ITEM_ID [null] (tablenum, colnum) = (9, 2) VCid=2 map from VirtualColumnId map before: T1.STATE_ID [null] (tablenum, colnum) = (11, 4) VCid=4 map after: T1.STATE_ID [null] (tablenum, colnum) = (9, 4) VCid=4 map from VirtualColumnId map before: T1.JZ_DISCRIMINATOR [null] (tablenum, colnum) = (11, 1) VCid=1 map after: T1.JZ_DISCRIMINATOR [null] (tablenum, colnum) = (9, 1) VCid=1 map from VirtualColumnId map before: T3.FAMILY_ITEM_ID [null] (tablenum, colnum) = (11, 3) VCid=17 Found 3 columns of the name FAMILY_ITEM_ID map after: T3.FAMILY_ITEM_ID [null] (tablenum, colnum) = (9, 3) VCid=17 map from VirtualColumnId map before: T5.ROOT_ITEM_ID [T5] (tablenum, colnum) = (11, 21) VCid=21 map after: T5.ROOT_ITEM_ID [T5] (tablenum, colnum) = (10, 2) VCid=21 map from ColumnPosition map before: T1.VISIBILITY [null] (tablenum, colnum) = (11, 5) VCid=5 map after: T1.VISIBILITY [null] (tablenum, colnum) = (9, 5) VCid=5 map from VirtualColumnId map before: T1.FAMILY_ITEM_ID [null] (tablenum, colnum) = (0, 3) VCid=3 Found 3 columns of the name FAMILY_ITEM_ID map after: T1.FAMILY_ITEM_ID [null] (tablenum, colnum) = (9, 3) VCid=3 map from VirtualColumnId
          Hide
          Dag H. Wanvik added a comment -

          Just for the record, the fix to DERBY-2526 didn't cause this problem, it's just that the solution is not sufficient for this case.

          Show
          Dag H. Wanvik added a comment - Just for the record, the fix to DERBY-2526 didn't cause this problem, it's just that the solution is not sufficient for this case.
          Hide
          Dag H. Wanvik added a comment -

          Uploading drawing.txt of the query tree at three instances in time, decorated with
          (tablenumber, column number) for join predicates and search predicates:

          • before flattening
          • after flattening of IJ with T7
          • after flattening of IJ with T5

          This shows in yet another way how "t3.FAMILY_ITEM_ID" gets the wrong column number during the flattening process.

          I still don't have a suggestion for how to fix this problem, since the old approach seems broken: Somehow, the correct result column should be identified using both the table alias and the column name involved. Maybe looking into the matching result columns searching for its alias could work to disambiguate the result column search, if we are always able to find it.

          Show
          Dag H. Wanvik added a comment - Uploading drawing.txt of the query tree at three instances in time, decorated with (tablenumber, column number) for join predicates and search predicates: before flattening after flattening of IJ with T7 after flattening of IJ with T5 This shows in yet another way how "t3.FAMILY_ITEM_ID" gets the wrong column number during the flattening process. I still don't have a suggestion for how to fix this problem, since the old approach seems broken: Somehow, the correct result column should be identified using both the table alias and the column name involved. Maybe looking into the matching result columns searching for its alias could work to disambiguate the result column search, if we are always able to find it.
          Hide
          Dag H. Wanvik added a comment - - edited

          In updated drawing.txt, the search predicate which has a wrong column number is given as:

          t3.FAMILY_ITEM_ID ='_5VetVWTeEd-Q8aOqWJPEIQ'= (9,3) still wrong, should be (9,13)

          By manually tweaking it, the query ran successfully.
          So, getting the column re-binding right would fix the problem.

          Show
          Dag H. Wanvik added a comment - - edited In updated drawing.txt, the search predicate which has a wrong column number is given as: t3.FAMILY_ITEM_ID ='_5VetVWTeEd-Q8aOqWJPEIQ'= (9,3) still wrong, should be (9,13) By manually tweaking it, the query ran successfully. So, getting the column re-binding right would fix the problem.
          Hide
          Bryan Pendleton added a comment -

          > The immediate problem is that the remapping of
          > t3.FAMILY_ITEM_ID chooses the first occurence of FAMILY_ITEM_ID and
          > ands up with the occurence for z1/t1, which leads to the fatal mixup.

          Your analysis makes sense to me. Thanks for taking the time to prepare
          the clear writeups and supporting materials.

          > using the columnName will not uniquely
          > identify the correct result column, since the multi-way's concatenated
          > list of result columns contains several columns with the same names,

          It seems like the DERBY-2526 fix did not expect this to occur, and that's where
          the problem lies.

          Can we widen the interface to the getResultColumn() API so that we can
          tell it specifically which table number we want the desired column to reside in?

          Show
          Bryan Pendleton added a comment - > The immediate problem is that the remapping of > t3.FAMILY_ITEM_ID chooses the first occurence of FAMILY_ITEM_ID and > ands up with the occurence for z1/t1, which leads to the fatal mixup. Your analysis makes sense to me. Thanks for taking the time to prepare the clear writeups and supporting materials. > using the columnName will not uniquely > identify the correct result column, since the multi-way's concatenated > list of result columns contains several columns with the same names, It seems like the DERBY-2526 fix did not expect this to occur, and that's where the problem lies. Can we widen the interface to the getResultColumn() API so that we can tell it specifically which table number we want the desired column to reside in?
          Hide
          Dag H. Wanvik added a comment -

          Thanks for your comments, Bryan! Yes, I agree with you proposal that the API could be adjusted here.
          I am currently working with an experiment to extend that API method with the desired table&column number instead of the string.

          Show
          Dag H. Wanvik added a comment - Thanks for your comments, Bryan! Yes, I agree with you proposal that the API could be adjusted here. I am currently working with an experiment to extend that API method with the desired table&column number instead of the string.
          Hide
          Dag H. Wanvik added a comment -

          Attaching a patch which makes the query work basically using the approach suggested.
          It turns out to be slightly more involved than that, cf the code comments, in that

          a) we fall back on the column name approach when the CR refers to a base table
          b) we have to save the original (tn, cn) pair between flattenings to make the new approach work for multiple join flattenings.

          The repro showcases the need for both the above modifications to the basic approach.

          It also adds the test case to JoinTest in a new fixture. Regressions ran ok.

          Show
          Dag H. Wanvik added a comment - Attaching a patch which makes the query work basically using the approach suggested. It turns out to be slightly more involved than that, cf the code comments, in that a) we fall back on the column name approach when the CR refers to a base table b) we have to save the original (tn, cn) pair between flattenings to make the new approach work for multiple join flattenings. The repro showcases the need for both the above modifications to the basic approach. It also adds the test case to JoinTest in a new fixture. Regressions ran ok.
          Hide
          Bryan Pendleton added a comment -

          Hi Dag, thanks for the patch!

          I see what you mean, that it's a slightly involved patch, but I haven't thought
          of any ways to simplify this; it's a slightly involved problem after all

          The patch may have a few tab-versus-space issues, in particular the
          block comment around line 860 of ColumnReference.java seemed to
          have whitespace-only modifications.

          It might be nice to add an additional sentence or two to the method
          comments for the new getResultColumn method in ResultColumnList
          to indicate that access by the (table-number,column-number) pair is
          used by the optimizer when it is flattening queries, and has to use
          this information to uniquely distinguish the column in situations where
          the same table may appear multiple times in the queries with
          separate correlation names.

          It might be nice to add a comment to the new test case in JoinTest.java,
          perhaps something along the lines of:

          Verify that when transitive closure generates new criteria into
          the query, it isn't confused by situations where the same column name
          appears in a result column list multiple times due to
          flattening of sub-queries.

          Obviously the full details will be available in the Jira entry, but it is nice
          sometimes to have a brief description of the test's purpose in the test itself.

          Show
          Bryan Pendleton added a comment - Hi Dag, thanks for the patch! I see what you mean, that it's a slightly involved patch, but I haven't thought of any ways to simplify this; it's a slightly involved problem after all The patch may have a few tab-versus-space issues, in particular the block comment around line 860 of ColumnReference.java seemed to have whitespace-only modifications. It might be nice to add an additional sentence or two to the method comments for the new getResultColumn method in ResultColumnList to indicate that access by the (table-number,column-number) pair is used by the optimizer when it is flattening queries, and has to use this information to uniquely distinguish the column in situations where the same table may appear multiple times in the queries with separate correlation names. It might be nice to add a comment to the new test case in JoinTest.java, perhaps something along the lines of: Verify that when transitive closure generates new criteria into the query, it isn't confused by situations where the same column name appears in a result column list multiple times due to flattening of sub-queries. Obviously the full details will be available in the Jira entry, but it is nice sometimes to have a brief description of the test's purpose in the test itself.
          Hide
          Dag H. Wanvik added a comment -

          Thanks, Bryan! Yes, these are good improvement suggestions!

          As for the whitespace changes around line 860, I am aware of those,
          I was vacillating back and forth on that in order to make the diff look at readable as possible,
          but in the end, I also prefer to keeps the diffs minimal, so I'll reverse that change.

          Show
          Dag H. Wanvik added a comment - Thanks, Bryan! Yes, these are good improvement suggestions! As for the whitespace changes around line 860, I am aware of those, I was vacillating back and forth on that in order to make the diff look at readable as possible, but in the end, I also prefer to keeps the diffs minimal, so I'll reverse that change.
          Hide
          Dag H. Wanvik added a comment -

          Uploading derby-4679b, which improves on the comments as suggested by Bryan,
          and also simplified the logic slightly in ColumnReference change. Regressions ran ok.

          I forgot to mention that the patch also contains a small improvement to the dumping of query trees that I found missing during debugging: let printSubNodes also print sourceResultSet (we no longer need to worry about circularities since nodes are only printed once).

          Show
          Dag H. Wanvik added a comment - Uploading derby-4679b, which improves on the comments as suggested by Bryan, and also simplified the logic slightly in ColumnReference change. Regressions ran ok. I forgot to mention that the patch also contains a small improvement to the dumping of query trees that I found missing during debugging: let printSubNodes also print sourceResultSet (we no longer need to worry about circularities since nodes are only printed once).
          Hide
          Bryan Pendleton added a comment -

          Thanks for the updated patch, the new comments are very helpful.
          I have no further suggestions to offer; I think that your patch improves
          the system (making more queries run correctly) and I think you should
          move forward with it.

          Show
          Bryan Pendleton added a comment - Thanks for the updated patch, the new comments are very helpful. I have no further suggestions to offer; I think that your patch improves the system (making more queries run correctly) and I think you should move forward with it.
          Hide
          Dag H. Wanvik added a comment - - edited

          Thanks, Bryan, for having another look!
          Committed patch derby-4679b as svn 952237 to solve this issue. Not resolving yet, planning to backport to 10.6 branch.

          Show
          Dag H. Wanvik added a comment - - edited Thanks, Bryan, for having another look! Committed patch derby-4679b as svn 952237 to solve this issue. Not resolving yet, planning to backport to 10.6 branch.
          Hide
          Dag H. Wanvik added a comment -

          Due to the findings in DERBY-4695, I will hold off back-porting this until that issue is fixed as well.

          Show
          Dag H. Wanvik added a comment - Due to the findings in DERBY-4695 , I will hold off back-porting this until that issue is fixed as well.
          Hide
          Dag H. Wanvik added a comment -

          Uploading a follow-up patch for this issue, which:

          • adds a sanity check that a found RC has the expected column name (due to the issues seen in DERBY-4695). A wrong (tn, cn) pair might in theory lead us to bind to the wrong result column. In Sane mode, this situation will throw an assert, in insane, we will fall back on the old method of binding using the column name. This makes it safer to back-port this issue fix to the 10.6 branch.

          If/when we manage to resolve DERB-4695, we can remove the code added in this follow-up patch.
          Regressions ran ok.

          Show
          Dag H. Wanvik added a comment - Uploading a follow-up patch for this issue, which: adds a sanity check that a found RC has the expected column name (due to the issues seen in DERBY-4695 ). A wrong (tn, cn) pair might in theory lead us to bind to the wrong result column. In Sane mode, this situation will throw an assert, in insane, we will fall back on the old method of binding using the column name. This makes it safer to back-port this issue fix to the 10.6 branch. If/when we manage to resolve DERB-4695, we can remove the code added in this follow-up patch. Regressions ran ok.
          Hide
          Dag H. Wanvik added a comment - - edited

          I did some more research to understand the limitations of the approach
          taken by the fix in this issue, i.e. locating the result column based
          on the (tablenumber, columnnumber) pair. It turns out it fails to work
          also when the column reference is to a subquery, which can be
          flattened before we get the the join flattening, so the table number
          of the CR is no longer to be found, and we fall back on the "buggy" approach of
          looking for the column name described above.

          A slight reformulation of the original problem query makes the bug
          reappear:

          "select distinct t1.ITEM_ID, t1.state_id, t1.JZ_DISCRIMINATOR from " +
          "((((((select * from ABSTRACT_INSTANCE z1 where z1.JZ_DISCRIMINATOR = 238) t1 " +
          " left outer join LAB_RESOURCE_OPERATINGSYSTEM j1 on (t1.ITEM_ID = j1.JZ_PARENT_ID)) " +
          " left outer join ABSTRACT_INSTANCE t2 on (j1.ITEM_ID = t2.ITEM_ID)) " +
          " left outer join OPERATING_SYSTEM_SOFTWARE_INSTALL j2 on (t2.ITEM_ID = j2.JZ_PARENT_ID))" +
          " left outer join (select * from ABSTRACT_INSTANCE) tCorr on (j2.ITEM_ID = tCorr.ITEM_ID) " +
          " inner join FAMILY t5 on (t2.FAMILY_ITEM_ID = t5.ITEM_ID)) " +
          " inner join FAMILY t7 on (t1.FAMILY_ITEM_ID = t7.ITEM_ID)) " +
          "where (tCorr.FAMILY_ITEM_ID IN('_5VetVWTeEd-Q8aOqWJPEIQ') and " +
          " (t5.ROOT_ITEM_ID = '_5ZDlwWTeEd-Q8aOqWJPEIQ') and " +
          " (t7.ROOT_ITEM_ID ='_5nN9mmTeEd-Q8aOqWJPEIQ') and " +
          " (t1.VISIBILITY = 0))");

          Note that now, instead of t3, we now have a subquery, tCorr.

          However, during my research on DERBY-4695, I reported that skipping
          the 2nd bind block in JoinNode (unless we have an OUTER JOIN), made
          the observed numbering inconsistency go away.

          Well, it turns out that skipping the second bind for inner joins makes
          the above query variant work, too, as well as the original problem
          query in this issue, without the fix I have already committed for
          this issue!

          It could be that this is the root problem. Since it also solves the
          above variant, and removes the wrong numbering seen, it seems a better
          fix anyway, if regressions tests still hold up. We'll see that soon.

          [Edit: 2010-06-24:

          It is not the root cause, because the second bind is needed, cf
          conclusions in DERBY-4695. The second binding is somewhat
          broken/problematic, though, for two reasons:

          a) it sometimes leads to wrong column number (see DERBY-4695).

          b) we can't rebind correctly when subqueries are flattened
          (cf. the above query variant) because the node (and the table
          number) of the column reference representing the subquery
          column is gone, so the approach taken here (i.e. to match the
          result column using table number and column number of the CR)
          fails to locate a matching result column, forcing us back to
          the old approach, which fails due to column names not being
          unique.
          ]

          Show
          Dag H. Wanvik added a comment - - edited I did some more research to understand the limitations of the approach taken by the fix in this issue, i.e. locating the result column based on the (tablenumber, columnnumber) pair. It turns out it fails to work also when the column reference is to a subquery, which can be flattened before we get the the join flattening, so the table number of the CR is no longer to be found, and we fall back on the "buggy" approach of looking for the column name described above. A slight reformulation of the original problem query makes the bug reappear: "select distinct t1.ITEM_ID, t1.state_id, t1.JZ_DISCRIMINATOR from " + "((((((select * from ABSTRACT_INSTANCE z1 where z1.JZ_DISCRIMINATOR = 238) t1 " + " left outer join LAB_RESOURCE_OPERATINGSYSTEM j1 on (t1.ITEM_ID = j1.JZ_PARENT_ID)) " + " left outer join ABSTRACT_INSTANCE t2 on (j1.ITEM_ID = t2.ITEM_ID)) " + " left outer join OPERATING_SYSTEM_SOFTWARE_INSTALL j2 on (t2.ITEM_ID = j2.JZ_PARENT_ID))" + " left outer join (select * from ABSTRACT_INSTANCE) tCorr on (j2.ITEM_ID = tCorr.ITEM_ID) " + " inner join FAMILY t5 on (t2.FAMILY_ITEM_ID = t5.ITEM_ID)) " + " inner join FAMILY t7 on (t1.FAMILY_ITEM_ID = t7.ITEM_ID)) " + "where (tCorr.FAMILY_ITEM_ID IN('_5VetVWTeEd-Q8aOqWJPEIQ') and " + " (t5.ROOT_ITEM_ID = '_5ZDlwWTeEd-Q8aOqWJPEIQ') and " + " (t7.ROOT_ITEM_ID ='_5nN9mmTeEd-Q8aOqWJPEIQ') and " + " (t1.VISIBILITY = 0))"); Note that now, instead of t3, we now have a subquery, tCorr. However, during my research on DERBY-4695 , I reported that skipping the 2nd bind block in JoinNode (unless we have an OUTER JOIN), made the observed numbering inconsistency go away. Well, it turns out that skipping the second bind for inner joins makes the above query variant work, too, as well as the original problem query in this issue, without the fix I have already committed for this issue! It could be that this is the root problem. Since it also solves the above variant, and removes the wrong numbering seen, it seems a better fix anyway, if regressions tests still hold up. We'll see that soon. [Edit: 2010-06-24: It is not the root cause, because the second bind is needed, cf conclusions in DERBY-4695 . The second binding is somewhat broken/problematic, though, for two reasons: a) it sometimes leads to wrong column number (see DERBY-4695 ). b) we can't rebind correctly when subqueries are flattened (cf. the above query variant) because the node (and the table number) of the column reference representing the subquery column is gone, so the approach taken here (i.e. to match the result column using table number and column number of the CR) fails to locate a matching result column, forcing us back to the old approach, which fails due to column names not being unique. ]
          Hide
          Dag H. Wanvik added a comment -

          Sadly, this approach also breaks others queries: they all seem to be characterized by
          a mix of inner and outer joins.

          Show
          Dag H. Wanvik added a comment - Sadly, this approach also breaks others queries: they all seem to be characterized by a mix of inner and outer joins.
          Hide
          Dag H. Wanvik added a comment - - edited

          Having understood the reason for the inconsistency seen in DERBY-4695, I think I can
          conclude that the current patch, with the addition of the follow-up patch should be
          safe in the sense of not causing regressions. Therefore, I now intend to commit
          the follow-up patch and back-port both present patches to the 10.6 branch.

          Further investigations are necessary to find a way to handle to subquery variant query show above.

          Show
          Dag H. Wanvik added a comment - - edited Having understood the reason for the inconsistency seen in DERBY-4695 , I think I can conclude that the current patch, with the addition of the follow-up patch should be safe in the sense of not causing regressions. Therefore, I now intend to commit the follow-up patch and back-port both present patches to the 10.6 branch. Further investigations are necessary to find a way to handle to subquery variant query show above.
          Hide
          Dag H. Wanvik added a comment -

          Committed derby-4679-followup as svn 957260.

          Show
          Dag H. Wanvik added a comment - Committed derby-4679-followup as svn 957260.
          Hide
          Dag H. Wanvik added a comment -

          Back-ported patches to 10.6 branch as svn 957287.

          Show
          Dag H. Wanvik added a comment - Back-ported patches to 10.6 branch as svn 957287.
          Hide
          Dag H. Wanvik added a comment -

          Uploading another follow-up patch for this issue, which also handles of the
          a CR to a subquery join participant being rebound after flattening, see detailed comments in the code.
          Extra test cases are added to JoinTest#testDerby_4679. Regressions ran ok.

          Show
          Dag H. Wanvik added a comment - Uploading another follow-up patch for this issue, which also handles of the a CR to a subquery join participant being rebound after flattening, see detailed comments in the code. Extra test cases are added to JoinTest#testDerby_4679. Regressions ran ok.
          Hide
          Dag H. Wanvik added a comment -

          Committed derby-4679-2 as svn 958618.

          Show
          Dag H. Wanvik added a comment - Committed derby-4679-2 as svn 958618.
          Hide
          Dag H. Wanvik added a comment -

          Back-ported to 10.6 as svn 958621, resolving.

          David, if you think this issue has been adequately addressed, feel free to close the issue.

          Show
          Dag H. Wanvik added a comment - Back-ported to 10.6 as svn 958621, resolving. David, if you think this issue has been adequately addressed, feel free to close the issue.
          Hide
          David Richards added a comment -

          I couldn't find any information on the derby pages about planned releases. Can someone link me to or provide the information about when the versions with this fix in (10.6.1.1, 10.7.0.0) will be officially released?

          Show
          David Richards added a comment - I couldn't find any information on the derby pages about planned releases. Can someone link me to or provide the information about when the versions with this fix in (10.6.1.1, 10.7.0.0) will be officially released?
          Hide
          Kristian Waagan added a comment -

          I don't think anyone has stepped up to act as the release manager yet.
          The currently empty pages with information about potential upcoming releases are:
          http://wiki.apache.org/db-derby/DerbyTenSixTwoRelease
          http://wiki.apache.org/db-derby/DerbyTenSevenOneRelease

          Show
          Kristian Waagan added a comment - I don't think anyone has stepped up to act as the release manager yet. The currently empty pages with information about potential upcoming releases are: http://wiki.apache.org/db-derby/DerbyTenSixTwoRelease http://wiki.apache.org/db-derby/DerbyTenSevenOneRelease
          Hide
          Kathey Marsden added a comment -

          Reopen for backport.

          Show
          Kathey Marsden added a comment - Reopen for backport.
          Hide
          Kathey Marsden added a comment -

          Assign to myself temporarily for backport

          Show
          Kathey Marsden added a comment - Assign to myself temporarily for backport
          Hide
          Kathey Marsden added a comment -

          Resolving now that fix has been backported to 10.5

          Show
          Kathey Marsden added a comment - Resolving now that fix has been backported to 10.5
          Hide
          Knut Anders Hatlen added a comment -

          [bulk update: close all resolved issues that haven't had any activity the last year]

          Show
          Knut Anders Hatlen added a comment - [bulk update: close all resolved issues that haven't had any activity the last year]

            People

            • Assignee:
              Dag H. Wanvik
              Reporter:
              David Richards
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development