Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Several implementors have reported that the parquet spec is currently unclear as to when repeated fields can span page boundaries (aka can a logical record be split across a page and/or row group boundary)
Discussion on list: https://lists.apache.org/thread/rd8twnvg4bg3558r507rzpxckcxt5wdn
The conclusion seems to be that the records can't be split across boundaries for "v2 data pages" or if there is a page index.
We should clarify the spec to make this clear
Attachments
Issue Links
- links to