Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-922

Add index pages to the format to support efficient page skipping

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • format-2.4.0
    • parquet-format
    • None

    Description

      When a Parquet file is sorted we can define an index consisting of the boundary values for the pages of the columns sorted on as well as the offsets and length of said pages in the file.
      The goal is to optimize lookup and range scan type queries, using this to read only the pages containing data matching the filter.
      We'd require the pages to be aligned accross columns.

      marcelk will add a link to the google doc to discuss the spec

      Attachments

        Issue Links

          Activity

            People

              marcelk Marcel Kinard
              julienledem Julien Le Dem
              Votes:
              0 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: