Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-2568

New partition pruning prevents the optimization for trivial COUNT(*) queries

    XMLWordPrintableJSON

Details

    Description

      With the new interpreter based partition pruning, if the query has only partition filters and they are pushed into the Scan, we don't drop the Filter node from the plan. This prevents the optimization for COUNT queries against parquet files where we read the count values directly from the parquet files instead of scanning and aggregating. The ConvertCountToDirectScan rule does not get applied if there is an intervening Filter between the Scan and the Aggregate nodes.

      0: jdbc:drill:zk=local> explain plan for select count(*) from dfs.`/Users/asinha/data/multilevel/parquet` where dir0=1995;
      +------------+------------+
      |    text    |    json    |
      +------------+------------+
      | 00-00    Screen
      00-01      StreamAgg(group=[{}], EXPR$0=[COUNT()])
      00-02        Project($f0=[0])
      00-03          SelectionVectorRemover
      00-04            Filter(condition=[=($0, 1995)])
      00-05              Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/asinha/data/multilevel/parquet/1995/Q1/orders_95_q1.parquet], ReadEntryWithPath [path=file:/Users/asinha/data/multilevel/parquet/1995/Q2/orders_95_q2.parquet], ReadEntryWithPath [path=file:/Users/asinha/data/multilevel/parquet/1995/Q3/orders_95_q3.parquet], ReadEntryWithPath [path=file:/Users/asinha/data/multilevel/parquet/1995/Q4/orders_95_q4.parquet]], selectionRoot=/Users/asinha/data/multilevel/parquet, numFiles=4, columns=[`dir0`]]])
      

      Attachments

        Activity

          People

            amansinha100 Aman Sinha
            amansinha100 Aman Sinha
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: