Derby
  1. Derby
  2. DERBY-4712 Complex nested joins problems
  3. DERBY-4736

ASSERT FAIL when code generating a column reference in a join predicate in presence of other outer join reordering

    Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.0.2.0, 10.0.2.1, 10.1.1.0, 10.1.2.1, 10.1.3.1, 10.2.1.6, 10.2.2.0, 10.3.1.4, 10.3.2.1, 10.3.3.0, 10.4.1.3, 10.4.2.0, 10.5.1.1, 10.5.2.0, 10.5.3.0, 10.6.1.0
    • Fix Version/s: 10.5.3.2, 10.6.2.1, 10.7.1.1
    • Component/s: SQL
    • Labels:
      None
    • Issue & fix info:
      Repro attached

      Description

      From schema given in DERBY-4712, this query gives an ASSERT with sane Derby:

      rs = s.executeQuery("SELECT 1 FROM (T0 LEFT JOIN (T1 LEFT JOIN (T2 LEFT JOIN " +
      " (T3 LEFT JOIN T4 ON 1=1) ON T2.X = T3.X) ON 1=1) ON 1=1) " +
      " LEFT JOIN " +
      " (T5 INNER JOIN T6 ON 1=1) " +
      " ON T2.X = 1 ");

      Cf the attachments in DERBY-4712 assert-bind-opt-trees.*.
      From preliminary analysis, this error seems to be unrelated to the NPEs reported in DERBY-4712, so filing this as a sub-issue.

      1. query_plan_derby_4736.pdf
        86 kB
        Nirmal Fernando
      2. loj-analysis.txt
        4 kB
        Dag H. Wanvik
      3. derby-4736-followup-a.stat
        0.2 kB
        Dag H. Wanvik
      4. derby-4736-followup-a.diff
        3 kB
        Dag H. Wanvik
      5. derby-4736-1d.stat
        0.2 kB
        Dag H. Wanvik
      6. derby-4736-1d.diff
        10 kB
        Dag H. Wanvik
      7. derby-4736-1c.stat
        0.2 kB
        Dag H. Wanvik
      8. derby-4736-1c.diff
        9 kB
        Dag H. Wanvik
      9. derby-4736-1b.stat
        0.2 kB
        Dag H. Wanvik
      10. derby-4736-1b.diff
        9 kB
        Dag H. Wanvik
      11. derby-4736-1a.stat
        0.2 kB
        Dag H. Wanvik
      12. derby-4736-1a.diff
        8 kB
        Dag H. Wanvik

        Issue Links

          Activity

          Hide
          Kathey Marsden added a comment -

          Closing after backport to 10.5. Note only code change was backported as OuterJoinTest is not in 10.5 branch.

          Show
          Kathey Marsden added a comment - Closing after backport to 10.5. Note only code change was backported as OuterJoinTest is not in 10.5 branch.
          Hide
          Kathey Marsden added a comment -

          Assign to myself for backport

          Show
          Kathey Marsden added a comment - Assign to myself for backport
          Hide
          Dag H. Wanvik added a comment -

          Backported to 10.6 branch as svn 997790.

          Show
          Dag H. Wanvik added a comment - Backported to 10.6 branch as svn 997790.
          Hide
          Dag H. Wanvik added a comment -

          Committed follow-up patch as snv 989099, closing again.

          Show
          Dag H. Wanvik added a comment - Committed follow-up patch as snv 989099, closing again.
          Hide
          Dag H. Wanvik added a comment -

          Attaching a patch which fixes the nullability issue and adds a new test case.

          The problem is that the added call to SelectNode#bindResultColumns, in addition to calling
          fromList.bindResultColumns, which what we need in for this issue, also calls super.bindResultColumns,
          which sets up the datatypes over again, erroneously (i.e. without taking into consideration the nature of outer join which can make values stemming from otherwise NOT NULL columns be null in the final result set).

          Replacing the call to SelectNode#bindResultColumns with fromTable.bindResultColumns avoids this problem. Running regressions.

          Show
          Dag H. Wanvik added a comment - Attaching a patch which fixes the nullability issue and adds a new test case. The problem is that the added call to SelectNode#bindResultColumns, in addition to calling fromList.bindResultColumns, which what we need in for this issue, also calls super.bindResultColumns, which sets up the datatypes over again, erroneously (i.e. without taking into consideration the nature of outer join which can make values stemming from otherwise NOT NULL columns be null in the final result set). Replacing the call to SelectNode#bindResultColumns with fromTable.bindResultColumns avoids this problem. Running regressions.
          Hide
          Dag H. Wanvik added a comment -

          In some cases, with this fix, the nullability of columns from the null-producing (right) side of the LOJ
          gets set to NOT NULL after reassociation.

          Show
          Dag H. Wanvik added a comment - In some cases, with this fix, the nullability of columns from the null-producing (right) side of the LOJ gets set to NOT NULL after reassociation.
          Hide
          Dag H. Wanvik added a comment -

          Committed derby-4736-1d as svn 987539, closing.

          Show
          Dag H. Wanvik added a comment - Committed derby-4736-1d as svn 987539, closing.
          Hide
          Dag H. Wanvik added a comment -

          Uploading version 1d of this patch, which just improves legibility a bit in the new RuntimeStatisticsParser#assertSequence argument strings, by interpreting "_" as TAB and removing the explicit newline in favor if it being implicit.

          I think this patch is pretty safe, so if there are no objections, I'll commit it tomorrow.

          Show
          Dag H. Wanvik added a comment - Uploading version 1d of this patch, which just improves legibility a bit in the new RuntimeStatisticsParser#assertSequence argument strings, by interpreting "_" as TAB and removing the explicit newline in favor if it being implicit. I think this patch is pretty safe, so if there are no objections, I'll commit it tomorrow.
          Hide
          Dag H. Wanvik added a comment -

          Thanks for looking at this, Rick.
          Actually, I don't replace the binding of expressions, I just added a call to bind the result columns which was missing and necessary, cf the explanation here: https://issues.apache.org/jira/browse/DERBY-4736?focusedCommentId=12886049&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12886049

          (JoinNode#bindResultColumns is not called as part of JoinNode#bindExpressions.)

          Show
          Dag H. Wanvik added a comment - Thanks for looking at this, Rick. Actually, I don't replace the binding of expressions, I just added a call to bind the result columns which was missing and necessary, cf the explanation here: https://issues.apache.org/jira/browse/DERBY-4736?focusedCommentId=12886049&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12886049 (JoinNode#bindResultColumns is not called as part of JoinNode#bindExpressions.)
          Hide
          Rick Hillegas added a comment -

          Thanks for the patch, Dag. It's a hopeful sign that this only involves a 1 line change to the engine code. If I understand this change correctly, it seems that you have replaced a full rebinding of all expressions with a more limited rebinding of just the result columns. I would think that a full rebinding would include the more limited rebinding of the result columns. Since you have just been through this pinball machine, can you explain why this fixes the problem? Thanks.

          Show
          Rick Hillegas added a comment - Thanks for the patch, Dag. It's a hopeful sign that this only involves a 1 line change to the engine code. If I understand this change correctly, it seems that you have replaced a full rebinding of all expressions with a more limited rebinding of just the result columns. I would think that a full rebinding would include the more limited rebinding of the result columns. Since you have just been through this pinball machine, can you explain why this fixes the problem? Thanks.
          Hide
          Dag H. Wanvik added a comment -

          Actually, it was Knut's simplified query that got an erroneous reordering.
          If I use the original query from DERBY-4712, we get a correct reordering and the ASSERT
          error seen. So, we will address the wrong reordering in DERBY-4471 and use the original query
          in our test for this issue.

          Uploading DERBY-4736-1c using the original query, checking that we get the correct result and also that the expected reordering takes place. Running regressions, ready for review.

          Show
          Dag H. Wanvik added a comment - Actually, it was Knut's simplified query that got an erroneous reordering. If I use the original query from DERBY-4712 , we get a correct reordering and the ASSERT error seen. So, we will address the wrong reordering in DERBY-4471 and use the original query in our test for this issue. Uploading DERBY-4736 -1c using the original query, checking that we get the correct result and also that the expected reordering takes place. Running regressions, ready for review.
          Hide
          Nirmal Fernando added a comment -

          Hi,

          I'm attaching the query plan created using PlanExporter tool (DERBY-4587)
          (I created a PDF, using screen shots of the generated HTML), for the query
          SELECT count FROM (T2 LEFT JOIN (T3 left outer JOIN T4 ON 1=1) ON T2.X = T3.X) ,
          since I saw Dag's comment saying incorrect results returned for that query.

          Hope this helps!

          Thanks.

          Show
          Nirmal Fernando added a comment - Hi, I'm attaching the query plan created using PlanExporter tool ( DERBY-4587 ) (I created a PDF, using screen shots of the generated HTML), for the query SELECT count FROM (T2 LEFT JOIN (T3 left outer JOIN T4 ON 1=1) ON T2.X = T3.X) , since I saw Dag's comment saying incorrect results returned for that query. Hope this helps! Thanks.
          Hide
          Dag H. Wanvik added a comment -

          The analysis in "loj-analysis.txt" which I upload, shows why the result gets different (and wrong) with the reordering, by spelling out the intermediate results of the two orderings variants.

          Show
          Dag H. Wanvik added a comment - The analysis in "loj-analysis.txt" which I upload, shows why the result gets different (and wrong) with the reordering, by spelling out the intermediate results of the two orderings variants.
          Hide
          Dag H. Wanvik added a comment - - edited

          This smaller part of the query gives a wrong result, too, unless I comment out the reorder logic:

          SELECT count FROM (T2 LEFT JOIN (T3 left outer JOIN T4 ON 1=1) ON T2.X = T3.X)

          returns 32 rows, but should return 20, and does if I comment out the reorder logic. So does PostgreSQL. Manually rewriting this query to be left-deep gives 32 rows with both Derby and PostgreSQL. DERBY-4471?
          (Answer: yes. Edited dhw 2010-08-20)

          Show
          Dag H. Wanvik added a comment - - edited This smaller part of the query gives a wrong result, too, unless I comment out the reorder logic: SELECT count FROM (T2 LEFT JOIN (T3 left outer JOIN T4 ON 1=1) ON T2.X = T3.X) returns 32 rows, but should return 20, and does if I comment out the reorder logic. So does PostgreSQL. Manually rewriting this query to be left-deep gives 32 rows with both Derby and PostgreSQL. DERBY-4471 ? (Answer: yes. Edited dhw 2010-08-20)
          Hide
          Dag H. Wanvik added a comment -

          Uploading a new version of the preliminary patch, derby-4736-1b, which makes the test
          correctly expect a row count of 1280 (the same as postgresql). By commenting out the LOJ reordering code in SelectNode, Derby produces the same number of rows also, so there is another bug in the reordering logic somewhere. The test currently fails, although the assert is gone.

          Show
          Dag H. Wanvik added a comment - Uploading a new version of the preliminary patch, derby-4736-1b, which makes the test correctly expect a row count of 1280 (the same as postgresql). By commenting out the LOJ reordering code in SelectNode, Derby produces the same number of rows also, so there is another bug in the reordering logic somewhere. The test currently fails, although the assert is gone.
          Hide
          Dag H. Wanvik added a comment - - edited

          Attaching a patch which makes the ASSERT go away and the query complete, as well as a new fixture to OuterJoinTest: testDerby4736.
          The problem is that a LOJ reordering has happened (cf. assert-bind-opt-trees.txt: positions of T2, T3 and T4),
          the result column lists are regenerated for the affected join nodes.

          The change is detected in SelectNode#preprocess, cf. this if-statement, ca line 997:

          if (anyChange)

          { FromList afromList = (FromList) getNodeFactory().getNode(C_NodeTypes.FROM_LIST, getNodeFactory().doJoinOrderOptimization(), getContextManager()); bindExpressions(afromList); }

          However, the call to bindExpressions does not lead to rebinding of the join clause ON T2.X = 1, since the join clauses are
          bound as part of JoinNode#BindResultColumns, cf call to deferredBindExpressions. Adding the call

          bindResultColumns(aFromList)

          to the above if-block makes the ASSERT go away.

          I am not convinced that the execution results are correct, though, since it returns a different number of rows that does PostgreSQL.
          I will investigate.

          Show
          Dag H. Wanvik added a comment - - edited Attaching a patch which makes the ASSERT go away and the query complete, as well as a new fixture to OuterJoinTest: testDerby4736. The problem is that a LOJ reordering has happened (cf. assert-bind-opt-trees.txt: positions of T2, T3 and T4), the result column lists are regenerated for the affected join nodes. The change is detected in SelectNode#preprocess, cf. this if-statement, ca line 997: if (anyChange) { FromList afromList = (FromList) getNodeFactory().getNode(C_NodeTypes.FROM_LIST, getNodeFactory().doJoinOrderOptimization(), getContextManager()); bindExpressions(afromList); } However, the call to bindExpressions does not lead to rebinding of the join clause ON T2.X = 1, since the join clauses are bound as part of JoinNode#BindResultColumns, cf call to deferredBindExpressions. Adding the call bindResultColumns(aFromList) to the above if-block makes the ASSERT go away. I am not convinced that the execution results are correct, though, since it returns a different number of rows that does PostgreSQL. I will investigate.
          Show
          Dag H. Wanvik added a comment - - edited for preliminary analysis, see https://issues.apache.org/jira/browse/DERBY-4712?focusedCommentId=12885701&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12885701

            People

            • Assignee:
              Dag H. Wanvik
              Reporter:
              Dag H. Wanvik
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development