[SPARK-21583] Create a ColumnarBatch with ArrowColumnVectors for row based iteration - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.3.0
Fix Version/s: 2.3.0
Component/s: SQL
Labels:
None

Description

The existing ArrowColumnVector creates a read-only vector of Arrow data. It would be useful to be able to create a ColumnarBatch to allow row based iteration over multiple ArrowColumnVectors. This would avoid extra copying to translate column elements into rows and be more efficient memory usage while increasing performance.

Attachments

Issue Links

blocks

SPARK-20791 Use Apache Arrow to Improve Spark createDataFrame from Pandas.DataFrame

Resolved

is related to

SPARK-21472 Introduce ArrowColumnVector as a reader for Arrow vectors.

Resolved

links to

[Github] Pull Request #18787 (BryanCutler)

[Github] Pull Request #19098 (BryanCutler)

Activity

People

Assignee:: Bryan Cutler

Reporter:: Bryan Cutler

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 31/Jul/17 17:31

Updated:: 12/Dec/22 18:10

Resolved:: 31/Aug/17 04:09