Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.2.0
-
None
Description
The optimizer does currently not eagerly remove unused attributes.
For example given a table tab5 with five attributes a, b, c, d, e, the following query
SELECT x.a, y.b FROM tab5 AS x, tab5 AS y WHERE x.a = y.a
would result in the non-optimized plan
LogicalProject(a=[$0], b=[$6]) LogicalFilter(condition=[=($0, $5)]) LogicalJoin(condition=[true], joinType=[inner]) LogicalTableScan(table=[[tab5]]) LogicalTableScan(table=[[tab5]])
and the optimized plan:
DataSetCalc(select=[a, b0 AS b]) DataSetJoin(where=[=(a, a0)], join=[a, b, c, d, e, a0, b0, c0, d0, e0], joinType=[InnerJoin]) DataSetScan(table=[[_DataSetTable_0]]) DataSetScan(table=[[_DataSetTable_0]])
This plan is inefficient because it joins all ten attributes of both tables instead of eagerly projecting out all unused fields (x.b, x.c, x.d, x.e, y.c, y.d, y.e).
Since this is one of the most common optimizations, I would assume that Calcite provides some rules to extract eager projections. If this is the case, the issue can be solved by adding such rules to FlinkRuleSets.
Attachments
Issue Links
- blocks
-
FLINK-3848 Add ProjectableTableSource interface and translation rule
- Closed
- links to