Windows Spark fails to work with Linux YARN.
This is a cross-platform problem.
This error occurs when 'yarn-client' mode is used.
(yarn-cluster/yarn-standalone mode was not tested.)
On YARN side, Hadoop 2.4.0 resolved the issue as follows:
But Spark YARN module does not incorporate the new YARN API yet, so problem persists for Spark.
First, the following source files should be changed:
Change is as follows:
- Replace .$() to .$$()
- Replace File.pathSeparator for Environment.CLASSPATH.name to ApplicationConstants.CLASS_PATH_SEPARATOR (import org.apache.hadoop.yarn.api.ApplicationConstants is required for this)
Unless the above are applied, launch_container.sh will contain invalid shell script statements(since they will contain Windows-specific separators), and job will fail.
Also, the following symptom should also be fixed (I could not find the relevant source code):
- SPARK_HOME environment variable is copied straight to launch_container.sh. It should be changed to the path format for the server OS, or, the better, a separate environment variable or a configuration variable should be created.
- '%HADOOP_MAPRED_HOME%' string still exists in launch_container.sh, after the above change is applied. maybe I missed a few lines.
I'm not sure whether this is all, since I'm new to both Spark and YARN.