Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.1.0
-
None
-
None
Description
In 2.4.0, spark.executor.pyspark.memory was added to limit the total memory space of a python worker. There is another RDD setting, spark.python.worker.memory that controls when Spark decides to spill data to disk. These are currently similar, but not related to one another.
PySpark should probably use spark.executor.pyspark.memory to limit or default the setting of spark.python.worker.memory because the latter property controls spilling and should be lower than the total memory limit. Renaming spark.python.worker.memory would also help clarity because it sounds like it should control the limit, but is more like the JVM setting spark.memory.fraction.
Attachments
Issue Links
- relates to
-
SPARK-25004 Add spark.executor.pyspark.memory config to set resource.RLIMIT_AS
- Resolved