Details
-
New Feature
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
Description
As documented here: https://github.com/apache/parquet-format/blob/master/PageIndex.md
The parquet format supports indexing of pages within a column chunk. Beyond what is accomplished by rowgroup statistics, this can further accelerate scanning.
As far as I am able to tell, Arrow does not currently support these indices (I see no references to offset_index_offset, column_index_offset, etc). Are there plans to add support, either in C++ or pyarrow?