Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12635

More efficient (column batch) serialization for Python/R

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • None
    • None
    • PySpark, SparkR, SQL

    Description

      Serialization between Scala / Python / R is pretty slow. Python and R both work pretty well with column batch interface (e.g. numpy arrays). Technically we should be able to just pass column batches around with minimal serialization (maybe even zero copy memory).

      Note that this depends on some internal refactoring to use a column batch interface in Spark SQL.

      Attachments

        Activity

          People

            Unassigned Unassigned
            rxin Reynold Xin
            Votes:
            7 Vote for this issue
            Watchers:
            23 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: