Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
2.3.0
-
None
Description
When submitting a spark application in --deploy-mode cluster + spark standalone cluster, environment variables from the client machine overwrite server environment variables.
We use SPARK_DIST_CLASSPATH environment variable to add extra required dependencies to the application. We observed that client machine SPARK_DIST_CLASSPATH overwrite remote server machine value, resulting in application submission failure.
We have inspected the code and found:
1. In org.apache.spark.deploy.Client line 86:
val command = new Command(mainClass, Seq("{{WORKER_URL}}", "{{USER_JAR}}", driverArgs.mainClass) ++ driverArgs.driverOptions, sys.env, classPathEntries, libraryPathEntries, javaOpts)
2. In org.apache.spark.launcher.WorkerCommandBuilder line 35:
childEnv.putAll(command.environment.asJava) childEnv.put(CommandBuilderUtils.ENV_SPARK_HOME, sparkHome)
Seen in line 35 is that the environment is overwritten in the server machine but in line 36 the SPARK_HOME is restored to the server value.
We think the bug can be fixed by adding a line that restores SPARK_DIST_CLASSPATH to its server value, similar to SPARK_HOME