Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4986 Use Parquet statistics when evaluating min/max/count aggregates
  3. IMPALA-5621

Apply Parquet stats optimizations in conjunction with predicates against Parquet stats

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Backend
    • None
    • ghx-label-8

    Description

      Impala can skip processing blocks based on predicates against Parquet statistics, for Rowgroups that qualify the predicates use data stored in the Parquet statistics to speedup the query

      select count(*), max(ss_item_sk) from store_sales where where ss_item_sk > 10 and ss_item_sk < 9999999999; 
      

      For RowGroups that have min(ss_item_sk) > 10 and max(ss_item_sk) the scanner should use the count stored in the stats opposed to evaluating each row in the RowGroup, same thing applies to min/max values.

      Attachments

        Activity

          People

            Unassigned Unassigned
            mmokhtar Mostafa Mokhtar
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: