Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
While our use case is not common, I was able to find one related request from roughly a year ago. Could this be added as a feature?
https://issues.apache.org/jira/browse/PARQUET-1047
Motivation
We have an application where each row is associated with one of N contexts, though a minority of contexts may have no associated rows. When encountering the Nth context, we will wish to retrieve all the associated rows. Row groups would provide a natural way to index the data, as the nth context could naturally relate to the nth row group.
Unfortunately, this is not possible at the present time, as pyarrow does not support writing empty row groups. If one writes a pyarrow.Table containing zero rows using pyarrow.parquet.ParquetWriter, it is omitted from the final file, and this distorts the indexing.
Attachments
Issue Links
- links to