Apache Drill
  1. Apache Drill
  2. DRILL-684

Use parquet row count in cost-based optimization. Use parquet row count, column value count to optimize count() aggregate function.

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.4.0
    • Component/s: None
    • Labels:
      None

      Description

      Parquet group scan provides the exact row count and the exact value count for each individual column. Such information could be leveraged in the following two ways:

      1. Use the count in the cost estimation, when query refers parquet files.

      2. Use the row count or column value count to optimize count() aggregate function.

      For instance, select count from parquet_file;
      select count(column_a) from parquet_file;

      First query could be transformed to return the row count directly, the second one could return the column value count for 'column_a'. Both of the two cases will avoid scan the whole parquet files, thus improve query performance.

        Activity

        No work has yet been logged on this issue.

          People

          • Assignee:
            Jinfeng Ni
            Reporter:
            Jinfeng Ni
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development