Description
I just noticed that NOT IN correlated subqueries produces an extra un-neccessary join after de-correlation (this is an addition to un-necessary joins reported in CALCITE-1494)
Query
select sal from emp where empno NOT IN ( select deptno from dept where emp.job = dept.name)
Plan after subquery remove rule:
LogicalProject(SAL=[$5]) LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8]) LogicalFilter(condition=[NOT(CASE(=($9, 0), false, IS NOT NULL($12), true, <($10, $9), true, false))]) LogicalCorrelate(correlation=[$cor0], joinType=[LEFT], requiredColumns=[{2}]) LogicalCorrelate(correlation=[$cor0], joinType=[LEFT], requiredColumns=[{2}]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) LogicalAggregate(group=[{}], c=[COUNT()], ck=[COUNT($0)]) LogicalProject(DEPTNO=[$0]) LogicalFilter(condition=[=($cor0.JOB, $1)]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) LogicalFilter(condition=[=($cor0.EMPNO, $0)]) LogicalAggregate(group=[{0, 1}]) LogicalProject(DEPTNO=[$0], i=[true]) LogicalProject(DEPTNO=[$0]) LogicalFilter(condition=[=($cor0.JOB, $1)]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
Plan after de-correlation
LogicalProject(SAL=[$5]) LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8]) LogicalFilter(condition=[NOT(CASE(=($10, 0), false, IS NOT NULL($14), true, <($11, $10), true, false))]) LogicalJoin(condition=[AND(=($0, $15), =($2, $13))], joinType=[left]) LogicalJoin(condition=[=($2, $9)], joinType=[left]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) LogicalAggregate(group=[{0}], c=[COUNT()], ck=[COUNT($1)]) LogicalProject(JOB=[$1], DEPTNO=[$0]) LogicalProject(DEPTNO=[$0], JOB=[$2]) LogicalJoin(condition=[=($2, $1)], joinType=[inner]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) LogicalAggregate(group=[{0}]) LogicalProject(JOB=[$2]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) LogicalJoin(condition=[=($3, $0)], joinType=[inner]) // <== Un-necessary join LogicalProject(DEPTNO=[$0], JOB=[$1], $f2=[true]) LogicalAggregate(group=[{0, 1}]) LogicalProject(DEPTNO=[$0], JOB=[$2], i=[$1]) LogicalProject(DEPTNO=[$0], i=[true], JOB=[$1]) LogicalProject(DEPTNO=[$0], JOB=[$2]) LogicalJoin(condition=[=($2, $1)], joinType=[inner]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) LogicalAggregate(group=[{0}]) LogicalProject(JOB=[$2]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) LogicalAggregate(group=[{0}]) LogicalProject(EMPNO=[$0]) LogicalTableScan(table=[[CATALOG, SALES, EMP]])
As you can see in plan after de-correlation there is an un-necessary inner join.
This is not reproducible on CALCITE-1494's branch. But since this is a separate issue from CALCITE-1494 I decided to open a separate JIRA.
Feel free to mark is duplicate or close it if you think otherwise.
Attachments
Issue Links
- is related to
-
CALCITE-2948 Complicated logical plan generated for in subquery with non-equi condition
- Reopened