Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
Description
The GetRecordBatchReader API is really useful for streaming ParquetFiles with lots of RLE.
I propose exposing this API in PyArrow in the following manner:
file_ = ParquetFile('file/path.parquet') for batch in file_.get_batches(batch_size=100): pass
(If anyone has any better ideas hit me up, I'm not 100% sold on exposing it this way.)