Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
ghx-label-8
Description
Impala can skip processing blocks based on predicates against Parquet statistics, for Rowgroups that qualify the predicates use data stored in the Parquet statistics to speedup the query
select count(*), max(ss_item_sk) from store_sales where where ss_item_sk > 10 and ss_item_sk < 9999999999;
For RowGroups that have min(ss_item_sk) > 10 and max(ss_item_sk) the scanner should use the count stored in the stats opposed to evaluating each row in the RowGroup, same thing applies to min/max values.