Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-5789

Query with two nested subqueries where the inner-most references the outer-most table returns wrong result

    XMLWordPrintableJSON

Details

    Description

      Problem can be reproduced with the following query (to be added e.g. in sub-query.iq):

      select deptno from dept d1 where exists (
       select 1 from dept d2 where d2.deptno = d1.deptno and exists (
        select 1 from dept d3 where d3.deptno = d2.deptno and d3.dname = d1.dname));
      

      The problem appears with at least two nested subqueries, when the inner most references the outermost table, in our case and d3.dname = d1.dname (if we remove this expression, the problem does not occur).

      When the above query is processed, the following plan is generated (notice how the top-most projection contains two fields instead of one):

      // Plan2 (Decorrelator output)
      LogicalProject(DEPTNO=[$0], DEPTNO0=[$4])
        LogicalJoin(condition=[AND(=($0, $2), =($1, $3))], joinType=[inner])
          LogicalProject(DEPTNO=[$0], DNAME=[$1])
            LogicalTableScan(table=[[scott, DEPT]])
          LogicalProject(DEPTNO3=[$0], DNAME0=[$3], DEPTNO0=[$4], $f3=[true])
            LogicalJoin(condition=[true], joinType=[inner])
              LogicalTableScan(table=[[scott, DEPT]])
              LogicalProject(DNAME=[$1], DEPTNO=[$0], $f2=[true])
                LogicalFilter(condition=[IS NOT NULL($1)])
                  LogicalTableScan(table=[[scott, DEPT]])
      

      Even if this minimalist quidem test does not fail at execution time, it leads to a situation where the RelRoot#validatedRowType and the RelRoot#rel#rowType don't match (see attached screenshot) , which can lead to unforeseeable consequences, including errors/exceptions in downstream projects (for the record, I have seen more complex queries which do fail at execution time because of this issue).

      The culprit that generates the unexpected extra field in the final projection is RelDecorrelator, however I think the decorrelator is not to blame here, because the input that reaches it is already wrong:

      // Plan1 (SubQueryRemoveRule output, Decorrelator input)
      LogicalProject(DEPTNO=[$0])
        LogicalProject(DEPTNO=[$0], DNAME=[$1])
          LogicalCorrelate(correlation=[$cor0], joinType=[inner], requiredColumns=[{0, 1}])
            LogicalProject(DEPTNO=[$0], DNAME=[$1])
              LogicalTableScan(table=[[scott, DEPT]])
            LogicalAggregate(group=[{0}])
              LogicalProject(i=[true])
                LogicalProject(DEPTNO=[$0], DNAME=[$1], LOC=[$2])
                  LogicalFilter(condition=[=($0, $cor0.DEPTNO)])
      ***           LogicalJoin(condition=[true], joinType=[inner], variablesSet=[[$cor1, $cor0]])
                      LogicalTableScan(table=[[scott, DEPT]])
                      LogicalAggregate(group=[{0}])
                        LogicalProject(i=[true])
      ***                 LogicalFilter(condition=[AND(=($0, $cor1.DEPTNO), =($1, $cor0.DNAME))])
                            LogicalTableScan(table=[[scott, DEPT]])
      

      Notice how there is a correlated variable $cor1 used by the LogicalJoin, which is not generated by any LogicalCorrelate.
      Who generates this wrong plan? SubQueryRemoveRule, which takes the original plan and removes the nested subqueries with the questionable above result:

      // Plan0 (Original, SubQueryRemoveRule input)
      LogicalProject(DEPTNO=[$0])
        LogicalFilter(condition=[EXISTS({
          LogicalFilter(condition=[AND(=($0, $cor0.DEPTNO), EXISTS({
            LogicalFilter(condition=[AND(=($0, $cor1.DEPTNO), =($1, $cor0.DNAME))])
              LogicalTableScan(table=[[scott, DEPT]])
          }))], variablesSet=[[$cor1]])
            LogicalTableScan(table=[[scott, DEPT]])
        })], variablesSet=[[$cor0]])
          LogicalTableScan(table=[[scott, DEPT]])
      

      When processing the original plan, CoreRules.FILTER_SUB_QUERY_TO_CORRELATE correctly converts the first subquery into a correlation with $cor0. However, when processing the second (inner-most) subquery, we reach this code in SubQueryRemoveRule#matchFilter method:

      final Set<CorrelationId>  variablesSet =
                RelOptUtil.getVariablesUsed(e.rel);
      

      Which brings a set with TWO variables ($cor0 and $cor1), because the subquery RelNode references both: AND(=($0, $cor1.DEPTNO), =($1, $cor0.DNAME)). However $cor0 does not belong to the "current context", since it is the variable of the first subquery.
      My hypothesis is that the root cause is precisely the presence of the first subquery variable ($cor0) in the variableSet of the second subquery; if at this point the variable set would be simply $cor1, the problem would not occur.

      When reaching the SubQueryRemoveRule#rewriteExists with this variablesSet of two elements:

      builder.join(JoinRelType.INNER, builder.literal(true), variablesSet);
      

      The correlation is not introduced because of this check in RelBuilder:

        public RelBuilder join(JoinRelType joinType, RexNode condition,
            Set<CorrelationId> variablesSet) {
          ...
          final boolean correlate = checkIfCorrelated(variablesSet, joinType, left.rel, right.rel);
          ...
        }
      
        private boolean checkIfCorrelated(Set<CorrelationId> variablesSet,
            JoinRelType joinType, RelNode leftNode, RelNode rightRel) {
          if (variablesSet.size() != 1) {
            return false;
          }
          ...
        }
      

      Attachments

        1. debugger.png
          55 kB
          Ruben Q L

        Issue Links

          Activity

            People

              rubenql Ruben Q L
              rubenql Ruben Q L
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: