XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • Impala 2.8.0
    • None
    • Backend
    • ghx-label-3

    Description

      select min(int_col), max(bigint_col) from parquet_table;
      select min(int_col), max(bigint_col) from parquet_table group by partition_col;
      select min(int_col), max(int_col) from parquet_table; <--- case a little trickier because int_col refd twice
      

      The slot values for int_col and bigint_col can be directly filled in from the parquet::Statistics, assuming stats are available for both columns. No columns need to be scanned/materialized.

      This JIRA focuses on implementing this optimization in the simple case where all scanned columns feed into min/max aggregates and where all columns have parquet::Statistics. Those conditions can be relaxed, but should be addressed separately.

      This optimization opportunity must be detected by the planner and is not applicable when there are scan predicates.

      Attachments

        Activity

          People

            Unassigned Unassigned
            alex.behm Alexander Behm
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: