Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-44486

Implement PyArrow `self_destruct` feature for `toPandas`

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 4.0.0
    • 3.5.0, 4.0.0
    • Connect, PySpark
    • None

    Description

      Implement PyArrow `self_destruct` feature for `toPandas`

      To make the Spark configuration `spark.sql.execution.arrow.pyspark.selfDestruct.enabled` be used to enable PyArrow’s `self_destruct` feature in Spark Connect, which can save memory when creating a Pandas DataFrame via `toPandas` by freeing Arrow-allocated memory while building the Pandas DataFrame. 

      Attachments

        Activity

          People

            XinrongM Xinrong Meng
            XinrongM Xinrong Meng
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: