Details
Description
When submitting a driver using spark-submit in cluster mode, Spark 1.1.0 allowed you to omit the file:// or hdfs:// prefix from the application JAR URL, e.g.
./bin/spark-submit --deploy-mode cluster --master spark://joshs-mbp.att.net:7077 --class org.apache.spark.examples.SparkPi /Users/joshrosen/Documents/old-spark-releases/spark-1.1.0-bin-hadoop1/lib/spark-examples-1.1.0-hadoop1.0.4.jar
In Spark 1.1.1 and 1.2.0, this same command now fails with an error:
./bin/spark-submit --deploy-mode cluster --master spark://joshs-mbp.att.net:7077 --class org.apache.spark.examples.SparkPi /Users/joshrosen/Documents/Spark/examples/target/scala-2.10/spark-examples_2.10-1.1.2-SNAPSHOT.jar Jar url 'file:/Users/joshrosen/Documents/Spark/examples/target/scala-2.10/spark-examples_2.10-1.1.2-SNAPSHOT.jar' is not in valid format. Must be a jar file path in URL format (e.g. hdfs://XX.jar, file://XX.jar) Usage: DriverClient [options] launch <active-master> <jar-url> <main-class> [driver options] Usage: DriverClient kill <active-master> <driver-id>
I tried changing my URL to conform to the new format, but this either resulted in an error or a job that failed:
./bin/spark-submit --deploy-mode cluster --master spark://joshs-mbp.att.net:7077 --class org.apache.spark.examples.SparkPi file:///Users/joshrosen/Documents/Spark/examples/target/scala-2.10/spark-examples_2.10-1.1.2-SNAPSHOT.jar Jar url 'file:///Users/joshrosen/Documents/Spark/examples/target/scala-2.10/spark-examples_2.10-1.1.2-SNAPSHOT.jar' is not in valid format. Must be a jar file path in URL format (e.g. hdfs://XX.jar, file://XX.jar)
If I omit the extra slash:
./bin/spark-submit --deploy-mode cluster --master spark://joshs-mbp.att.net:7077 --class org.apache.spark.examples.SparkPi file://Users/joshrosen/Documents/Spark/examples/target/scala-2.10/spark-examples_2.10-1.1.2-SNAPSHOT.jar Sending launch command to spark://joshs-mbp.att.net:7077 Driver successfully submitted as driver-20141116143235-0002 ... waiting before polling master for driver state ... polling master for driver state State of driver-20141116143235-0002 is ERROR Exception from cluster was: java.lang.IllegalArgumentException: Wrong FS: file://Users/joshrosen/Documents/Spark/examples/target/scala-2.10/spark-examples_2.10-1.1.2-SNAPSHOT.jar, expected: file:/// java.lang.IllegalArgumentException: Wrong FS: file://Users/joshrosen/Documents/Spark/examples/target/scala-2.10/spark-examples_2.10-1.1.2-SNAPSHOT.jar, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:381) at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:55) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:393) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:329) at org.apache.spark.deploy.worker.DriverRunner.org$apache$spark$deploy$worker$DriverRunner$$downloadUserJar(DriverRunner.scala:157) at org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:74)
This bug effectively prevents users from using spark-submit in cluster mode to run drivers whose JARs are stored on shared cluster filesystems.