Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26679

Deconflict spark.executor.pyspark.memory and spark.python.worker.memory

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.1.0
    • None
    • PySpark
    • None

    Description

      In 2.4.0, spark.executor.pyspark.memory was added to limit the total memory space of a python worker. There is another RDD setting, spark.python.worker.memory that controls when Spark decides to spill data to disk. These are currently similar, but not related to one another.

      PySpark should probably use spark.executor.pyspark.memory to limit or default the setting of spark.python.worker.memory because the latter property controls spilling and should be lower than the total memory limit. Renaming spark.python.worker.memory would also help clarity because it sounds like it should control the limit, but is more like the JVM setting spark.memory.fraction.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              rdblue Ryan Blue
              Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated: