[SPARK-12635] More efficient (column batch) serialization for Python/R - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: None
Fix Version/s: None
Component/s: PySpark, SparkR, SQL
Labels:
- bulk-closed

Description

Serialization between Scala / Python / R is pretty slow. Python and R both work pretty well with column batch interface (e.g. numpy arrays). Technically we should be able to just pass column batches around with minimal serialization (maybe even zero copy memory).

Note that this depends on some internal refactoring to use a column batch interface in Spark SQL.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Reynold Xin

Votes:: 7 Vote for this issue

Watchers:: 21 Start watching this issue

Dates

Created:: 04/Jan/16 22:41

Updated:: 21/May/19 04:13

Resolved:: 21/May/19 04:13