Description
When dictionary encoding is in use the lookup table should contain a distinct list of all values in the data, can skip reading the values and just read the header to get distinct values.
Realize this would be a big change to the read/scanner threads but could greatly speed up distinct queries.
Attachments
Issue Links
- relates to
-
IMPALA-4986 Use Parquet statistics when evaluating min/max/count aggregates
- Open