Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-16991

[C++][Parquet][Python] Add Parquet ColumnIndex and OffsetIndex support

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • C++, Parquet, Python
    • None

    Description

      As documented here: https://github.com/apache/parquet-format/blob/master/PageIndex.md

      The parquet format supports indexing of pages within a column chunk. Beyond what is accomplished by rowgroup statistics, this can further accelerate scanning.

      As far as I am able to tell, Arrow does not currently support these indices (I see no references to offset_index_offset, column_index_offset, etc). Are there plans to add support, either in C++ or pyarrow?

      Attachments

        Activity

          People

            Unassigned Unassigned
            ogego ogego
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: