Description
Some platforms support limiting Python's addressable memory space by limiting resource.RLIMIT_AS.
We've found that adding a limit is very useful when running in YARN because when Python doesn't know about memory constraints, it doesn't know when to garbage collect and will continue using memory when it doesn't need to. Adding a limit reduces PySpark memory consumption and avoids YARN killing containers because Python hasn't cleaned up memory.
This also improves error messages for users, allowing them to see when Python is allocating too much memory instead of YARN killing the container:
File "build/bdist.linux-x86_64/egg/package/library.py", line 265, in fe_engineer fe_eval_rec.update(f(src_rec_prep, mat_rec_prep)) File "build/bdist.linux-x86_64/egg/package/library.py", line 163, in fe_comp comparisons = EvaluationUtils.leven_list_compare(src_rec_prep.get(item, []), mat_rec_prep.get(item, [])) File "build/bdist.linux-x86_64/egg/package/evaluationutils.py", line 25, in leven_list_compare permutations = sorted(permutations, reverse=True) MemoryError
Attachments
Issue Links
- is depended upon by
-
SPARK-25021 Add spark.executor.pyspark.memory support to Kubernetes
- Resolved
-
SPARK-25022 Add spark.executor.pyspark.memory support to Mesos
- Resolved
- is related to
-
SPARK-26080 Unable to run worker.py on Windows
- Resolved
-
SPARK-26679 Deconflict spark.executor.pyspark.memory and spark.python.worker.memory
- Open
-
SPARK-26743 Add a test to check the actual resource limit set via 'spark.executor.pyspark.memory'
- Resolved
- links to