Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.17.0
-
None
-
None
Description
Currently Calcite translate "IN" to "OR" expression when the count of IN's operands less than "inSubQueryThreshold" or to "Join" when the operands count greater than "inSubQueryThreshold" to get better performance.
However this translation to "JOIN" is so complex. Especially when the "IN" expression located in the "select" or "join on condition".
For example:
select case when deptno in (1,2) then 0 else 1 end from emp
the logical plan generated as follow:
LogicalProject(EXPR$0=[CASE(CAST(CASE(=($9, 0), false, IS NOT NULL($13), true, IS NULL($11), null, <($10, $9), null, false)):BOOLEAN NOT NULL, 0, 1)]) LogicalJoin(condition=[=($11, $12)], joinType=[left]) LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f0=[$9], $f1=[$10], DEPTNO0=[$7]) LogicalJoin(condition=[true], joinType=[inner]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)]) LogicalProject(ROW_VALUE=[$0], $f1=[true]) LogicalValues(tuples=[[{ 1 }, { 2 }]]) LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) LogicalProject(ROW_VALUE=[$0], $f1=[true]) LogicalValues(tuples=[[{ 1 }, { 2 }]])
The generated logical plan is so complex for such a simple sql!
I think we can treat "IN" as a function like "plus" and "minus".So there is no translation on "IN" and just keep it as it is.This would be much clear in the logical plan!
In the execute stage,We can provide a "InExpression":
InExpression(left,condition0,condition1,...)
We can put all the constant conditions to a "Set".In that way,the computational complexity can reduce from O(n)to O(1).
It would be much clear and have a good performance.
PS: "In sub-query" is not included in our talk.
Attachments
Attachments
Issue Links
- duplicates
-
CALCITE-2706 Do not make IN conversion
- Closed
- is blocked by
-
CALCITE-2618 It is not possible to execute IN on Enumerable: "cannot translate call IN"
- Open
- is related to
-
CALCITE-2106 Where IN clause performs in memory filtration with > 20 values
- Open
-
CALCITE-2792 Stackoverflow while evaluating filter with large number of OR conditions
- Closed
- relates to
-
CALCITE-4173 Add internal SEARCH operator and Sarg literal, replacing use of IN in RexCall
- Closed