Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24456

Spark submit - server environment variables are overwritten by client environment variables

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Incomplete
    • Affects Version/s: 2.3.0
    • Fix Version/s: None
    • Component/s: Spark Submit
    • Labels:

      Description

      When submitting a spark application in --deploy-mode cluster + spark standalone cluster, environment variables from the client machine overwrite server environment variables. 

       

      We use SPARK_DIST_CLASSPATH environment variable to add extra required dependencies to the application. We observed that client machine SPARK_DIST_CLASSPATH overwrite remote server machine value, resulting in application submission failure. 

       

      We have inspected the code and found:

      1. In org.apache.spark.deploy.Client line 86:

      val command = new Command(mainClass,
       Seq("{{WORKER_URL}}", "{{USER_JAR}}", driverArgs.mainClass) ++ driverArgs.driverOptions,
       sys.env, classPathEntries, libraryPathEntries, javaOpts)

      2. In org.apache.spark.launcher.WorkerCommandBuilder line 35:

      childEnv.putAll(command.environment.asJava)
      childEnv.put(CommandBuilderUtils.ENV_SPARK_HOME, sparkHome)

      Seen in line 35  is that the environment is overwritten in the server machine but in line 36 the SPARK_HOME is restored to the server value.

      We think the bug can be fixed by adding a line that restores SPARK_DIST_CLASSPATH to its server value, similar to SPARK_HOME

       

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              lonchu Alon Shoham
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: