Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-5655

Wrong plan for multiple IN/SOME sub-queries with OR predicate

    XMLWordPrintableJSON

Details

    Description

      When the query contains multiple IN/SOME sub-queries connected with OR predicate in WHERE clause, the result is wrong. The minimal reproducer is below:
      SQL:

      select empno from sales.empnullables
      where deptno in (
        select deptno from sales.deptnullables where name = 'dept1')
      or deptno in (
        select deptno from sales.deptnullables where name = 'dept2')
      

      The Plan generated by calcite master branch: (Notice the bold part of <>($2, 0) in the downstream LogicalFilter)

      LogicalProject(EMPNO=[$0])
        LogicalProject(EMPNO=[$0], DEPTNO=[$1])
          LogicalFilter(condition=[OR(AND(<>($2, 0), IS NOT NULL($5), IS NOT NULL($1)), AND(***<>($2, 0)***, IS NOT NULL($9), IS NOT NULL($1)))])
            LogicalJoin(condition=[=($1, $8)], joinType=[left])
              LogicalJoin(condition=[true], joinType=[inner])
                LogicalJoin(condition=[=($1, $4)], joinType=[left])
                  LogicalJoin(condition=[true], joinType=[inner])
                    LogicalProject(EMPNO=[$0], DEPTNO=[$7])
                      LogicalTableScan(table=[[CATALOG, SALES, EMPNULLABLES]])
                    LogicalAggregate(group=[{}], c=[COUNT()], ck=[COUNT($0)])
                      LogicalProject(DEPTNO=[$0])
                        LogicalFilter(condition=[=($1, 'dept1')])
                          LogicalTableScan(table=[[CATALOG, SALES, DEPTNULLABLES]])
                  LogicalProject(DEPTNO=[$0], i=[true])
                    LogicalFilter(condition=[=($1, 'dept1')])
                      LogicalTableScan(table=[[CATALOG, SALES, DEPTNULLABLES]])
                LogicalAggregate(group=[{}], c=[COUNT()], ck=[COUNT($0)])
                  LogicalProject(DEPTNO=[$0])
                    LogicalFilter(condition=[=($1, 'dept2')])
                      LogicalTableScan(table=[[CATALOG, SALES, DEPTNULLABLES]])
              LogicalProject(DEPTNO=[$0], i=[true])
                LogicalFilter(condition=[=($1, 'dept2')])
                  LogicalTableScan(table=[[CATALOG, SALES, DEPTNULLABLES]])
      

      The wrong part is that when build the downstream LogicalFilter for the two sub-queries, the filter for the second sub-query is AND(<>($2, 0), IS NOT NULL($9), IS NOT NULL($1)), notice that $2 should be the second sub-query's intermediate table field ct.c(which field index is $6), but now the actual reference is the first sub-query's, this leads to wrong plan, and wrong result.
      The root cause is that intermediate table alias is the same as the previous sub-query's, but when lookup intermediate table field, it always returns the previous one which is not belong to the current subquery.

      Attachments

        Issue Links

          Activity

            People

              Runking Runkang He
              Runking Runkang He
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h 20m
                  2h 20m