Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-3020

[Python] Addition of option to allow empty Parquet row groups

    XMLWordPrintableJSON

Details

    Description

      While our use case is not common, I was able to find one related request from roughly a year ago. Could this be added as a feature?

      https://issues.apache.org/jira/browse/PARQUET-1047

      Motivation

      We have an application where each row is associated with one of N contexts, though a minority of contexts may have no associated rows. When encountering the Nth context, we will wish to retrieve all the associated rows. Row groups would provide a natural way to index the data, as the nth context could naturally relate to the nth row group.

      Unfortunately, this is not possible at the present time, as pyarrow does not support writing empty row groups. If one writes a pyarrow.Table containing zero rows using pyarrow.parquet.ParquetWriter, it is omitted from the final file, and this distorts the indexing.

      Attachments

        Issue Links

          Activity

            People

              wesm Wes McKinney
              AlexMendelson Alex Mendelson
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m