Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3716

Drill should push filter past aggregate in order to improve query performance.

    XMLWordPrintableJSON

Details

    Description

      For the following query which has a filter on top of an aggregation, Drill's currently push the filter pass through the aggregation. As a result, we may miss some optimization opportunity. For instance, such filter could potentially been pushed into scan if it qualifies for partition pruning.

      For the following query:

      select n_regionkey, cnt from 
           (select n_regionkey, count(*) cnt 
            from (select n.n_nationkey, n.n_regionkey, n.n_name 
                     from cp.`tpch/nation.parquet` n 
                        left join 
                             cp.`tpch/region.parquet` r 
                      on n.n_regionkey = r.r_regionkey) 
             group by n_regionkey) 
      where n_regionkey = 2;
      

      The current plan shows a filter (00-04) on top of aggregation(00-05). The better plan would have the filter pushed pass the aggregation.

      The root cause of this problem is Drill's ruleset does not include FilterAggregateTransoposeRule from Calcite library.

      00-01      Project(n_regionkey=[$0], cnt=[$1])
      00-02        Project(n_regionkey=[$0], cnt=[$1])
      00-03          SelectionVectorRemover
      00-04            Filter(condition=[=($0, 2)])
      00-05              StreamAgg(group=[{0}], cnt=[COUNT()])
      00-06                Project(n_regionkey=[$0])
      00-07                  MergeJoin(condition=[=($0, $1)], joinType=[left])
      00-09                    SelectionVectorRemover
      00-11                      Sort(sort0=[$0], dir0=[ASC])
      00-13                        Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]], selectionRoot=classpath:/tpch/nation.parquet, numFiles=1, columns=[`n_regionkey`]]])
      00-08                    SelectionVectorRemover
      00-10                      Sort(sort0=[$0], dir0=[ASC])
      00-12                        Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/region.parquet]], selectionRoot=classpath:/tpch/region.parquet, numFiles=1, columns=[`r_regionkey`]]])
      

      Attachments

        Issue Links

          Activity

            People

              jni Jinfeng Ni
              jni Jinfeng Ni
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: