Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
For the following query which has a filter on top of an aggregation, Drill's currently push the filter pass through the aggregation. As a result, we may miss some optimization opportunity. For instance, such filter could potentially been pushed into scan if it qualifies for partition pruning.
For the following query:
select n_regionkey, cnt from (select n_regionkey, count(*) cnt from (select n.n_nationkey, n.n_regionkey, n.n_name from cp.`tpch/nation.parquet` n left join cp.`tpch/region.parquet` r on n.n_regionkey = r.r_regionkey) group by n_regionkey) where n_regionkey = 2;
The current plan shows a filter (00-04) on top of aggregation(00-05). The better plan would have the filter pushed pass the aggregation.
The root cause of this problem is Drill's ruleset does not include FilterAggregateTransoposeRule from Calcite library.
00-01 Project(n_regionkey=[$0], cnt=[$1]) 00-02 Project(n_regionkey=[$0], cnt=[$1]) 00-03 SelectionVectorRemover 00-04 Filter(condition=[=($0, 2)]) 00-05 StreamAgg(group=[{0}], cnt=[COUNT()]) 00-06 Project(n_regionkey=[$0]) 00-07 MergeJoin(condition=[=($0, $1)], joinType=[left]) 00-09 SelectionVectorRemover 00-11 Sort(sort0=[$0], dir0=[ASC]) 00-13 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]], selectionRoot=classpath:/tpch/nation.parquet, numFiles=1, columns=[`n_regionkey`]]]) 00-08 SelectionVectorRemover 00-10 Sort(sort0=[$0], dir0=[ASC]) 00-12 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/region.parquet]], selectionRoot=classpath:/tpch/region.parquet, numFiles=1, columns=[`r_regionkey`]]])
Attachments
Issue Links
- is duplicated by
-
DRILL-2748 Filter is not pushed down into subquery with the group by
- Closed