Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24456

Spark submit - server environment variables are overwritten by client environment variables

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Incomplete
    • 2.3.0
    • None
    • Spark Submit

    Description

      When submitting a spark application in --deploy-mode cluster + spark standalone cluster, environment variables from the client machine overwrite server environment variables. 

       

      We use SPARK_DIST_CLASSPATH environment variable to add extra required dependencies to the application. We observed that client machine SPARK_DIST_CLASSPATH overwrite remote server machine value, resulting in application submission failure. 

       

      We have inspected the code and found:

      1. In org.apache.spark.deploy.Client line 86:

      val command = new Command(mainClass,
       Seq("{{WORKER_URL}}", "{{USER_JAR}}", driverArgs.mainClass) ++ driverArgs.driverOptions,
       sys.env, classPathEntries, libraryPathEntries, javaOpts)

      2. In org.apache.spark.launcher.WorkerCommandBuilder line 35:

      childEnv.putAll(command.environment.asJava)
      childEnv.put(CommandBuilderUtils.ENV_SPARK_HOME, sparkHome)

      Seen in line 35  is that the environment is overwritten in the server machine but in line 36 the SPARK_HOME is restored to the server value.

      We think the bug can be fixed by adding a line that restores SPARK_DIST_CLASSPATH to its server value, similar to SPARK_HOME

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            lonchu Alon Shoham
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: