Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
When a Parquet file is sorted we can define an index consisting of the boundary values for the pages of the columns sorted on as well as the offsets and length of said pages in the file.
The goal is to optimize lookup and range scan type queries, using this to read only the pages containing data matching the filter.
We'd require the pages to be aligned accross columns.
marcelk will add a link to the google doc to discuss the spec
Attachments
Issue Links
- blocks
-
PARQUET-1134 Release Parquet format 2.4.0
- Resolved
- is depended upon by
-
PARQUET-1201 Column indexes
- Resolved
-
PARQUET-1207 Write index page in parquet file
- Resolved
- is required by
-
IMPALA-5840 Don't write page level statistics in Parquet files in anticipation of page indexes
- Resolved
-
IMPALA-5842 Write page index in Parquet files
- Resolved
- links to