-
Type:
Bug
-
Status: Open
-
Priority:
Minor
-
Resolution: Unresolved
-
Affects Version/s: 2.3.3
-
Fix Version/s: None
-
Component/s: Deploy, Documentation, Spark Submit, Web UI
-
Labels:None
The doc says that "In general, configuration values explicitly set on a SparkConf take the highest precedence, then flags passed to spark-submit, then values in the defaults file", but when setting spark.local.dir through --conf with spark-submit, it still uses the values from ${SPARK_HOME}/conf/spark-defaults.conf, what's more, the Spark runtime UI environment variables shows the value from --conf, which is really misleading.
e.g.
I set submit my application through the command:
/opt/spark233/bin/spark-submit --properties-file /opt/spark.conf --conf spark.local.dir=/tmp/spark_local -v --class org.apache.spark.examples.mllib.SparseNaiveBayes --master spark://bdw-slave20:7077 /opt/sparkbench/assembly/target/sparkbench-assembly-7.1-SNAPSHOT-dist.jar hdfs://bdw-slave20:8020/Bayes/Input
the spark.local.dir in ${SPARK_HOME}/conf/spark-defaults.conf is:
spark.local.dir=/mnt/nvme1/spark_local
when the application is running, I found the intermediate shuffle data was wrote to /mnt/nvme1/spark_local, which is set through ${SPARK_HOME}/conf/spark-defaults.conf, but the Web UI shows that the environment value spark.local.dir=/tmp/spark_local.
The spark-submit verbose also shows spark.local.dir=/tmp/spark_local, it's misleading.
spark-submit verbose:
XXXX
Spark properties used, including those specified through
--conf and those from the properties file /opt/spark.conf:
(spark.local.dir,/tmp/spark_local)
(spark.default.parallelism,132)
(spark.driver.memory,10g)
(spark.executor.memory,352g)
XXXXX