Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25004

Add spark.executor.pyspark.memory config to set resource.RLIMIT_AS

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.0
    • Fix Version/s: 2.4.0
    • Component/s: PySpark
    • Labels:
      None
    • Target Version/s:

      Description

      Some platforms support limiting Python's addressable memory space by limiting resource.RLIMIT_AS.

      We've found that adding a limit is very useful when running in YARN because when Python doesn't know about memory constraints, it doesn't know when to garbage collect and will continue using memory when it doesn't need to. Adding a limit reduces PySpark memory consumption and avoids YARN killing containers because Python hasn't cleaned up memory.

      This also improves error messages for users, allowing them to see when Python is allocating too much memory instead of YARN killing the container:

        File "build/bdist.linux-x86_64/egg/package/library.py", line 265, in fe_engineer
          fe_eval_rec.update(f(src_rec_prep, mat_rec_prep))
        File "build/bdist.linux-x86_64/egg/package/library.py", line 163, in fe_comp
          comparisons = EvaluationUtils.leven_list_compare(src_rec_prep.get(item, []), mat_rec_prep.get(item, []))
        File "build/bdist.linux-x86_64/egg/package/evaluationutils.py", line 25, in leven_list_compare
          permutations = sorted(permutations, reverse=True)
        MemoryError
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                rdblue Ryan Blue
                Reporter:
                rdblue Ryan Blue
              • Votes:
                1 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: