Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22216

Improving PySpark/Pandas interoperability

    XMLWordPrintableJSON

    Details

    • Type: Epic
    • Status: Reopened
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.2.0
    • Fix Version/s: None
    • Component/s: PySpark
    • Labels:
      None

      Description

      This is an umbrella ticket tracking the general effort to improve performance and interoperability between PySpark and Pandas. The core idea is to Apache Arrow as serialization format to reduce the overhead between PySpark and Pandas.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                icexelloss Li Jin
                Reporter:
                icexelloss Li Jin
              • Votes:
                0 Vote for this issue
                Watchers:
                33 Start watching this issue

                Dates

                • Created:
                  Updated: