Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-9672

Drivers run in cluster mode on mesos may not have spark-env variables available

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.4.1
    • 1.6.0
    • Mesos, Spark Submit
    • None

    Description

      This issue definitely affects Mesos mode, but may effect complex standalone topologies as well.

      When running spark-submit with

      --deploy-mode cluster

      environment variables set in spark-env.sh that are not prefixed with SPARK_ are not available in the driver process. The behavior I expect is that any variables set in spark-env.sh are available on the driver and all executors.

      spark-env.sh is executed by load-spark-env.sh which uses an environment variable SPARK_ENV_LOADED [code to ensure that it is only run once. When using the RestSubmissionClient, spark submit propagates all environment variables that are prefixed with SPARK_ [code to the MesosRestServer where they are used to initialize the driver [code. During this process, SPARK_ENV_LOADED is propagated to the new driver process (since running spark submit has caused load-spark-env.sh to be run on the submitter's machine) [code. Now when load-spark-env.sh is called by MesosClusterScheduler SPARK_ENV_LOADED is set and spark-env.sh is never sourced.

      This gist shows the testing setup I used while investigating this issue. An example invocation looked like

      spark-1.5.0-SNAPSHOT-bin-custom-spark/bin/spark-submit --deploy-mode cluster --master mesos://172.31.34.154:7077 --class Test spark-env-var-test_2.10-0.1-SNAPSHOT.jar

      Attachments

        Activity

          People

            pashields Patrick Shields
            pashields Patrick Shields
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: