Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
1.4.1
-
None
-
Ubuntu 14.04
Mesos 0.23 (compiled from source following instructions on mesos site)
Spark 1.4 prebuilt for hadoop 2.6Test setup was a two node mesos cluster. One dedicated master and one dedicated slave. Spark submissions occurred on the master and were directed at a mesos dispatcher running on the master.
Ubuntu 14.04 Mesos 0.23 (compiled from source following instructions on mesos site) Spark 1.4 prebuilt for hadoop 2.6 Test setup was a two node mesos cluster. One dedicated master and one dedicated slave. Spark submissions occurred on the master and were directed at a mesos dispatcher running on the master.
Description
This issue definitely affects Mesos mode, but may effect complex standalone topologies as well.
When running spark-submit with
--deploy-mode cluster
environment variables set in spark-env.sh that are not prefixed with SPARK_ are not available in the driver process. The behavior I expect is that any variables set in spark-env.sh are available on the driver and all executors.
spark-env.sh is executed by load-spark-env.sh which uses an environment variable SPARK_ENV_LOADED [code to ensure that it is only run once. When using the RestSubmissionClient, spark submit propagates all environment variables that are prefixed with SPARK_ [code to the MesosRestServer where they are used to initialize the driver [code. During this process, SPARK_ENV_LOADED is propagated to the new driver process (since running spark submit has caused load-spark-env.sh to be run on the submitter's machine) [code. Now when load-spark-env.sh is called by MesosClusterScheduler SPARK_ENV_LOADED is set and spark-env.sh is never sourced.
This gist shows the testing setup I used while investigating this issue. An example invocation looked like
spark-1.5.0-SNAPSHOT-bin-custom-spark/bin/spark-submit --deploy-mode cluster --master mesos://172.31.34.154:7077 --class Test spark-env-var-test_2.10-0.1-SNAPSHOT.jar