Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1546

Wrong plan for NOT IN sub-queries with disjunction

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.12.0
    • Component/s: core
    • Labels:

      Description

      Query:

      select * from emp where sal = 4 OR empno NOT IN (select deptno from dept)

      Plan:

      LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
        LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
          LogicalFilter(condition=[OR(=($5, 4), NOT(CASE(IS NOT NULL($10), true, false)))])
            LogicalJoin(condition=[=($0, $9)], joinType=[left])
              LogicalTableScan(table=[[CATALOG, SALES, EMP]])
              LogicalAggregate(group=[{0, 1}])
                LogicalProject(DEPTNO=[$0], i=[true])
                  LogicalProject(DEPTNO=[$0])
                    LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
      

      There is no null check branch i.e. with count(*), count(c) in the plan. This produces wrong results if deptno is null in dept.

        Activity

        Show
        vgarg Vineet Garg added a comment - Added test case https://github.com/apache/calcite/compare/master...vineetgarg02:CALCITE-1546
        Hide
        vgarg Vineet Garg added a comment -

        Problem seems to be with LogicVisitor.find which is determining the logic for NOT IN subquery to be TRUE_FALSE when it should be UNKNOWN_AS_TRUE. LogicVisitor.find determines the logic of any subquery with OR predicate to be TRUE_FALSE irrespective of IN/NOT IN/EXISTS/NOT EXISTS.

        Julian Hyde Can you take a look and suggest what could a possible fix be for this? Calling LogicVisitor.find with Logic.UNKNOWN_AS_FALSE instead of with Logic.TRUE in SubQueryRemoveRule works as a workaround but it produces very inefficient plan and I have yet to test if this workaround works in for all cases.

        Show
        vgarg Vineet Garg added a comment - Problem seems to be with LogicVisitor.find which is determining the logic for NOT IN subquery to be TRUE_FALSE when it should be UNKNOWN_AS_TRUE . LogicVisitor.find determines the logic of any subquery with OR predicate to be TRUE_FALSE irrespective of IN/NOT IN/EXISTS/NOT EXISTS. Julian Hyde Can you take a look and suggest what could a possible fix be for this? Calling LogicVisitor.find with Logic.UNKNOWN_AS_FALSE instead of with Logic.TRUE in SubQueryRemoveRule works as a workaround but it produces very inefficient plan and I have yet to test if this workaround works in for all cases.
        Hide
        julianhyde Julian Hyde added a comment -

        You're right that LogicVisitor should return UNKNOWN_AS_TRUE in this case. The fix is to carefully rework LogicVisitor so that it returns the right result based on the structure of the predicate. That is the point of LogicVisitor: to follow the structure of the predicate, and in particular the flow of UNKNOWN values and negations.

        By the way, I think it's damaging to try to create "workarounds". You may have noticed that we have been playing whack-a-mole for some time. If we are not rigorous we will create wrong or inefficient plans for other queries. We need to rigorously test each fix so that we are not continually taking two steps forward, one step back (mea culpa for the regression I introduced in CALCITE-1526).

        Show
        julianhyde Julian Hyde added a comment - You're right that LogicVisitor should return UNKNOWN_AS_TRUE in this case. The fix is to carefully rework LogicVisitor so that it returns the right result based on the structure of the predicate. That is the point of LogicVisitor: to follow the structure of the predicate, and in particular the flow of UNKNOWN values and negations. By the way, I think it's damaging to try to create "workarounds". You may have noticed that we have been playing whack-a-mole for some time. If we are not rigorous we will create wrong or inefficient plans for other queries. We need to rigorously test each fix so that we are not continually taking two steps forward, one step back (mea culpa for the regression I introduced in CALCITE-1526 ).
        Show
        julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/0bd4f86b .
        Hide
        julianhyde Julian Hyde added a comment -

        Resolved in release 1.12.0 (2017-03-24).

        Show
        julianhyde Julian Hyde added a comment - Resolved in release 1.12.0 (2017-03-24).

          People

          • Assignee:
            julianhyde Julian Hyde
            Reporter:
            vgarg Vineet Garg
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development