Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
None
-
None
Description
Serialization between Scala / Python / R is pretty slow. Python and R both work pretty well with column batch interface (e.g. numpy arrays). Technically we should be able to just pass column batches around with minimal serialization (maybe even zero copy memory).
Note that this depends on some internal refactoring to use a column batch interface in Spark SQL.