Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1493

Wrong plan for NOT IN correlated queries

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.11.0
    • core

    Description

      Plan generated by calcite with SubqueryRemoveRule followed by de-correlation for the following query:

       select sal from emp where empno NOT IN (select deptno from dept where emp.job = dept.name) 

      is

      LogicalProject(SAL=[$5])
        LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
          LogicalFilter(condition=[IS NULL($11)])
            LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], DEPTNO0=[CAST($9):INTEGER], JOB0=[CAST($10):VARCHAR(10) CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary"], $f2=[CAST($11):BOOLEAN])
              LogicalJoin(condition=[AND(=($2, $10), =($0, $9))], joinType=[inner])
                LogicalTableScan(table=[[CATALOG, SALES, EMP]])
                LogicalProject(DEPTNO=[$0], JOB=[$1], $f2=[true])
                  LogicalAggregate(group=[{0, 1}])
                    LogicalProject(DEPTNO=[$0], JOB=[$2], i=[$1])
                      LogicalProject(DEPTNO=[$0], i=[true], JOB=[$1])
                        LogicalProject(DEPTNO=[$0], JOB=[$2])
                          LogicalJoin(condition=[=($2, $1)], joinType=[inner])
                            LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
                            LogicalAggregate(group=[{0}])
                              LogicalProject(JOB=[$2])
                                LogicalTableScan(table=[[CATALOG, SALES, EMP]])
      

      As you can notice instead of doing Left Outer Join Calcite is doing Inner Join. This will produce wrong results.

      Plan for same query just before SubqueryRemove Rule is:

      LogicalProject(SAL=[$5])
        LogicalFilter(condition=[NOT(IN($0, {
      LogicalProject(DEPTNO=[$0])
        LogicalFilter(condition=[=($cor0.JOB, $1)])
          LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
      }))], variablesSet=[[$cor0]])
          LogicalTableScan(table=[[CATALOG, SALES, EMP]])
      

      Plan just after SubqueryRemove Rule:

      LogicalProject(SAL=[$5])
        LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
          LogicalFilter(condition=[IS NULL($10)])
            LogicalFilter(condition=[=($0, $9)])
              LogicalCorrelate(correlation=[$cor0], joinType=[LEFT], requiredColumns=[{2}])
                LogicalTableScan(table=[[CATALOG, SALES, EMP]])
                LogicalAggregate(group=[{0, 1}])
                  LogicalProject(DEPTNO=[$0], i=[true])
                    LogicalProject(DEPTNO=[$0])
                      LogicalFilter(condition=[=($cor0.JOB, $1)])
                        LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
      

      Looking at above it seems RelDecorrelator have some issue where it is coming up with Inner Join.

      Attachments

        1. CALCITE-1493.test.patch
          2 kB
          Vineet Garg

        Activity

          People

            julianhyde Julian Hyde
            vgarg Vineet Garg
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: