Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
None
-
None
-
None
Description
If columns in "select distinct" are already distinct, there can be two sets of equivalent rel before and after AggregateRemoveRule.
agg | input input 10.0 100.0
Based on the default implementation of rel metadata, the rowCount of the "before" rel is only 1/10 of that of the "after" rel, but meanwhile the "after" rel is definitely cheaper. So the Volcano planner would most likely either fail to pick the cheapest one or have an inconsistent state due to CALCITE-830.
An example (based EnumerableRel cost model):
The plan for
select empno, d.deptno from "scott".emp join (select distinct deptno from "scott".dept) d using (deptno);
would be
EnumerableCalc(expr#0..2=[{inputs}], EMPNO=[$t1], DEPTNO=[$t0])
EnumerableJoin(condition=[=($0, $2)], joinType=[inner])
EnumerableAggregate(group=[$0])
EnumerableTableScan(table=[[scott, DEPT]])
EnumerableCalc(expr#0..7=[{inputs}], EMPNO=[$t0], DEPTNO=[$t7])
EnumerableTableScan(table=[[scott, EMP]])
, while it should be
EnumerableCalc(expr#0..2=[{inputs}], EMPNO=[$t1], DEPTNO=[$t0])
EnumerableJoin(condition=[=($0, $2)], joinType=[inner])
EnumerableCalc(expr#0..2=[{inputs}], DEPTNO=[$t0])
EnumerableTableScan(table=[[scott, DEPT]])
EnumerableCalc(expr#0..7=[{inputs}], EMPNO=[$t0], DEPTNO=[$t7])
EnumerableTableScan(table=[[scott, EMP]])