Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-7800

[Python] Expose GetRecordBatchReader API in PyArrow

    XMLWordPrintableJSON

Details

    Description

      The GetRecordBatchReader API is really useful for streaming ParquetFiles with lots of RLE.

      I propose exposing this API in PyArrow in the following manner:

      file_ = ParquetFile('file/path.parquet')
      
      for batch in file_.get_batches(batch_size=100):
           pass
      

      (If anyone has any better ideas hit me up, I'm not 100% sold on exposing it this way.)

      Attachments

        Activity

          People

            wjones127 Will Jones
            wjones127 Will Jones
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 10.5h
                10.5h