Description
The existing ArrowColumnVector creates a read-only vector of Arrow data. It would be useful to be able to create a ColumnarBatch to allow row based iteration over multiple ArrowColumnVectors. This would avoid extra copying to translate column elements into rows and be more efficient memory usage while increasing performance.
Attachments
Issue Links
- blocks
-
SPARK-20791 Use Apache Arrow to Improve Spark createDataFrame from Pandas.DataFrame
- Resolved
- is related to
-
SPARK-21472 Introduce ArrowColumnVector as a reader for Arrow vectors.
- Resolved
- links to