[SPARK-44486] Implement PyArrow `self_destruct` feature for `toPandas` - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 4.0.0
Fix Version/s: 3.5.0, 4.0.0
Component/s: Connect, PySpark
Labels:
None

Description

Implement PyArrow `self_destruct` feature for `toPandas`

To make the Spark configuration `spark.sql.execution.arrow.pyspark.selfDestruct.enabled` be used to enable PyArrow’s `self_destruct` feature in Spark Connect, which can save memory when creating a Pandas DataFrame via `toPandas` by freeing Arrow-allocated memory while building the Pandas DataFrame.

Attachments

Activity

People

Assignee:: Xinrong Meng

Reporter:: Xinrong Meng

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 20/Jul/23 00:09

Updated:: 25/Jul/23 00:44

Resolved:: 25/Jul/23 00:44