Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.3.3, 2.4.3, 3.0.0
Description
While testing hardware with more cores, we found that the amount of memory required by PySpark applications increased and tracked the problem to importing numpy. The numpy issue isĀ https://github.com/numpy/numpy/issues/10455
NumPy uses OpenMP that starts a thread pool with the number of cores on the machine (and does not respect cgroups). When we set this lower we see a significant reduction in memory consumption.
This parallelism setting should be set to the number of cores allocated to the executor, not the number of cores available.
Attachments
Issue Links
- is duplicated by
-
SPARK-28846 Set OMP_NUM_THREADS to executor cores for python
- Closed
- relates to
-
SPARK-42613 PythonRunner should set OMP_NUM_THREADS to task cpus instead of executor cores by default
- Resolved
- links to