Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
4.0.0, 3.5.1
-
None
-
None
Description
As a follow-up to SPARK-47365:
toArrow() is useful when the data is relatively small. For larger data, the best way to return the contents of a PySpark DataFrame in Arrow format is to return an iterator of PyArrow RecordBatches.
Attachments
Issue Links
- is related to
-
SPARK-48478 Allow passing iterator of PyArrow RecordBatches to createDataFrame()
- Open
- relates to
-
SPARK-47365 Add toArrow() DataFrame method to PySpark
- Resolved