Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1493

Wrong plan for NOT IN correlated queries

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.11.0
    • Component/s: core
    • Labels:

      Description

      Plan generated by calcite with SubqueryRemoveRule followed by de-correlation for the following query:

       select sal from emp where empno NOT IN (select deptno from dept where emp.job = dept.name) 

      is

      LogicalProject(SAL=[$5])
        LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
          LogicalFilter(condition=[IS NULL($11)])
            LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], DEPTNO0=[CAST($9):INTEGER], JOB0=[CAST($10):VARCHAR(10) CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary"], $f2=[CAST($11):BOOLEAN])
              LogicalJoin(condition=[AND(=($2, $10), =($0, $9))], joinType=[inner])
                LogicalTableScan(table=[[CATALOG, SALES, EMP]])
                LogicalProject(DEPTNO=[$0], JOB=[$1], $f2=[true])
                  LogicalAggregate(group=[{0, 1}])
                    LogicalProject(DEPTNO=[$0], JOB=[$2], i=[$1])
                      LogicalProject(DEPTNO=[$0], i=[true], JOB=[$1])
                        LogicalProject(DEPTNO=[$0], JOB=[$2])
                          LogicalJoin(condition=[=($2, $1)], joinType=[inner])
                            LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
                            LogicalAggregate(group=[{0}])
                              LogicalProject(JOB=[$2])
                                LogicalTableScan(table=[[CATALOG, SALES, EMP]])
      

      As you can notice instead of doing Left Outer Join Calcite is doing Inner Join. This will produce wrong results.

      Plan for same query just before SubqueryRemove Rule is:

      LogicalProject(SAL=[$5])
        LogicalFilter(condition=[NOT(IN($0, {
      LogicalProject(DEPTNO=[$0])
        LogicalFilter(condition=[=($cor0.JOB, $1)])
          LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
      }))], variablesSet=[[$cor0]])
          LogicalTableScan(table=[[CATALOG, SALES, EMP]])
      

      Plan just after SubqueryRemove Rule:

      LogicalProject(SAL=[$5])
        LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
          LogicalFilter(condition=[IS NULL($10)])
            LogicalFilter(condition=[=($0, $9)])
              LogicalCorrelate(correlation=[$cor0], joinType=[LEFT], requiredColumns=[{2}])
                LogicalTableScan(table=[[CATALOG, SALES, EMP]])
                LogicalAggregate(group=[{0, 1}])
                  LogicalProject(DEPTNO=[$0], i=[true])
                    LogicalProject(DEPTNO=[$0])
                      LogicalFilter(condition=[=($cor0.JOB, $1)])
                        LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
      

      Looking at above it seems RelDecorrelator have some issue where it is coming up with Inner Join.

        Attachments

          Activity

            People

            • Assignee:
              julianhyde Julian Hyde
              Reporter:
              vgarg Vineet Garg
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: