Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-2473

Clarify parquet-format with respect to repeated fields across boundaries

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • parquet-site
    • None

    Description

      Several implementors have reported that the parquet spec is currently unclear as to when repeated fields can span page boundaries (aka can a logical record be split across a page and/or row group boundary)

       

      Discussion on list: https://lists.apache.org/thread/rd8twnvg4bg3558r507rzpxckcxt5wdn

       

      The conclusion seems to be that the records can't be split across boundaries for "v2 data pages" or if there is a page index. 

       

      We should clarify the spec to make this clear

      Attachments

        Issue Links

          Activity

            People

              alamb Andrew Lamb
              alamb Andrew Lamb
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: