Description
Run this PySpark script with `spark.executor.cores=1`
import os from pyspark.sql import SparkSession from pyspark.sql.functions import udf spark = SparkSession.builder.getOrCreate() var_name = 'OMP_NUM_THREADS' def get_env_var(): return os.getenv(var_name) udf_get_env_var = udf(get_env_var) spark.range(1).toDF("id").withColumn(f"env_{var_name}", udf_get_env_var()).show(truncate=False)
Output with release `3.3.2`:
+---+-----------------------+ |id |env_OMP_NUM_THREADS | +---+-----------------------+ |0 |null | +---+-----------------------+
Output with release `3.3.0`:
+---+-----------------------+ |id |env_OMP_NUM_THREADS | +---+-----------------------+ |0 |1 | +---+-----------------------+
Attachments
Issue Links
- is caused by
-
SPARK-41188 Set executorEnv OMP_NUM_THREADS to be spark.task.cpus by default for spark executor JVM processes
- Resolved
- relates to
-
SPARK-42607 [MESOS] OMP_NUM_THREADS not set to number of executor cores by default
- Resolved
-
SPARK-42613 PythonRunner should set OMP_NUM_THREADS to task cpus instead of executor cores by default
- Resolved
- links to