Apache Drill
  1. Apache Drill
  2. DRILL-684

Use parquet row count in cost-based optimization. Use parquet row count, column value count to optimize count() aggregate function.

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.4.0
    • Component/s: None
    • Labels:
      None

      Description

      Parquet group scan provides the exact row count and the exact value count for each individual column. Such information could be leveraged in the following two ways:

      1. Use the count in the cost estimation, when query refers parquet files.

      2. Use the row count or column value count to optimize count() aggregate function.

      For instance, select count from parquet_file;
      select count(column_a) from parquet_file;

      First query could be transformed to return the row count directly, the second one could return the column value count for 'column_a'. Both of the two cases will avoid scan the whole parquet files, thus improve query performance.

        Activity

        Hide
        Jinfeng Ni added a comment -

        In addition to the code change for row count, the patch contains bug fixes:

        1) set the type's nullable property for extract function, 'any' type in view DDL or table column list.

        2) fix bug in logical/physical Project rule : set up the traits properly.

        Show
        Jinfeng Ni added a comment - In addition to the code change for row count, the patch contains bug fixes: 1) set the type's nullable property for extract function, 'any' type in view DDL or table column list. 2) fix bug in logical/physical Project rule : set up the traits properly.
        Hide
        Jacques Nadeau added a comment -

        added in cf2b888

        Show
        Jacques Nadeau added a comment - added in cf2b888

          People

          • Assignee:
            Jinfeng Ni
            Reporter:
            Jinfeng Ni
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development