Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
Impala 2.9.0
-
None
Description
There are various ways in which Parquet statistics such as num_rows and also parquet::Statistics can be used to speed up aggregation queries with min/max/count. Some of the improvements can be done at execution-time only, others also need query-plan modifications. The subtasks illustrate the various optimization opportunities/dimensions, and can be tackled separately.
Attachments
Issue Links
- is related to
-
IMPALA-7547 For distinct queries use dictionary encoded page instead of reading all data
- Open
1.
|
Use parquet::Statistics for simple min/max aggregates | Open | Unassigned | |
2.
|
Use parquet::Statistics for min/max aggregates when only a subset of scan columns have stats | Open | Unassigned | |
3.
|
Apply Parquet stats optimizations in conjunction with predicates against Parquet stats | Open | Unassigned |