Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-2630

Convert SqlInOperator to In-Expression

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.17.0
    • None
    • core
    • None

    Description

      Currently Calcite translate "IN" to "OR" expression when the count of  IN's operands less than "inSubQueryThreshold" or  to "Join" when the operands count greater  than "inSubQueryThreshold" to get better performance.

        However this translation to "JOIN" is so complex. Especially when the "IN" expression located in the "select" or "join on condition".

      For example:

      select case when deptno in (1,2) then 0 else 1 end from emp
      

      the logical plan generated as follow:

      LogicalProject(EXPR$0=[CASE(CAST(CASE(=($9, 0), false, IS NOT NULL($13), true, IS NULL($11), null, <($10, $9), null, false)):BOOLEAN NOT NULL, 0, 1)])
      LogicalJoin(condition=[=($11, $12)], joinType=[left])
       LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f0=[$9], $f1=[$10], DEPTNO0=[$7])
        LogicalJoin(condition=[true], joinType=[inner])
          LogicalTableScan(table=[[CATALOG, SALES, EMP]])
          LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
           LogicalProject(ROW_VALUE=[$0], $f1=[true])
            LogicalValues(tuples=[[{ 1 }, { 2 }]])
        LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
          LogicalProject(ROW_VALUE=[$0], $f1=[true])
            LogicalValues(tuples=[[{ 1 }, { 2 }]])
      
      

      The generated logical plan is so complex for such a simple sql!

      I think we can treat "IN" as a function like "plus" and "minus".So there is no translation on "IN" and just keep it as it is.This would be much clear in the logical plan!

      In the execute stage,We can provide a "InExpression":

      InExpression(left,condition0,condition1,...) 

       We can put all the constant conditions to a "Set".In that way,the computational complexity can reduce from O(n)to O(1).

      It would be much clear and have a good performance. 

      PS: "In sub-query" is not included in our talk.

      Attachments

        1. image-2020-07-07-21-08-30-242.png
          119 kB
          Jiatao Tao
        2. image-2020-07-07-21-08-28-667.png
          119 kB
          Jiatao Tao
        3. image-2020-07-07-21-08-24-015.png
          119 kB
          Jiatao Tao

        Issue Links

          Activity

            People

              pzw2018 pengzhiwei
              pzw2018 pengzhiwei
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated: