Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-4434

spark-submit cluster deploy mode JAR URLs are broken in 1.1.1

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 1.1.1, 1.2.0
    • 1.1.1, 1.2.0
    • Deploy, Spark Core
    • None

    Description

      When submitting a driver using spark-submit in cluster mode, Spark 1.1.0 allowed you to omit the file:// or hdfs:// prefix from the application JAR URL, e.g.

      ./bin/spark-submit --deploy-mode cluster --master spark://joshs-mbp.att.net:7077 --class org.apache.spark.examples.SparkPi /Users/joshrosen/Documents/old-spark-releases/spark-1.1.0-bin-hadoop1/lib/spark-examples-1.1.0-hadoop1.0.4.jar
      

      In Spark 1.1.1 and 1.2.0, this same command now fails with an error:

      ./bin/spark-submit --deploy-mode cluster --master spark://joshs-mbp.att.net:7077 --class org.apache.spark.examples.SparkPi /Users/joshrosen/Documents/Spark/examples/target/scala-2.10/spark-examples_2.10-1.1.2-SNAPSHOT.jar
      Jar url 'file:/Users/joshrosen/Documents/Spark/examples/target/scala-2.10/spark-examples_2.10-1.1.2-SNAPSHOT.jar' is not in valid format.
      Must be a jar file path in URL format (e.g. hdfs://XX.jar, file://XX.jar)
      
      Usage: DriverClient [options] launch <active-master> <jar-url> <main-class&gt; [driver options]
      Usage: DriverClient kill <active-master> <driver-id>
      

      I tried changing my URL to conform to the new format, but this either resulted in an error or a job that failed:

      ./bin/spark-submit --deploy-mode cluster --master spark://joshs-mbp.att.net:7077 --class org.apache.spark.examples.SparkPi file:///Users/joshrosen/Documents/Spark/examples/target/scala-2.10/spark-examples_2.10-1.1.2-SNAPSHOT.jar
      Jar url 'file:///Users/joshrosen/Documents/Spark/examples/target/scala-2.10/spark-examples_2.10-1.1.2-SNAPSHOT.jar' is not in valid format.
      Must be a jar file path in URL format (e.g. hdfs://XX.jar, file://XX.jar)
      

      If I omit the extra slash:

      ./bin/spark-submit --deploy-mode cluster --master spark://joshs-mbp.att.net:7077 --class org.apache.spark.examples.SparkPi file://Users/joshrosen/Documents/Spark/examples/target/scala-2.10/spark-examples_2.10-1.1.2-SNAPSHOT.jar
      Sending launch command to spark://joshs-mbp.att.net:7077
      Driver successfully submitted as driver-20141116143235-0002
      ... waiting before polling master for driver state
      ... polling master for driver state
      State of driver-20141116143235-0002 is ERROR
      Exception from cluster was: java.lang.IllegalArgumentException: Wrong FS: file://Users/joshrosen/Documents/Spark/examples/target/scala-2.10/spark-examples_2.10-1.1.2-SNAPSHOT.jar, expected: file:///
      java.lang.IllegalArgumentException: Wrong FS: file://Users/joshrosen/Documents/Spark/examples/target/scala-2.10/spark-examples_2.10-1.1.2-SNAPSHOT.jar, expected: file:///
      	at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:381)
      	at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:55)
      	at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:393)
      	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
      	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:329)
      	at org.apache.spark.deploy.worker.DriverRunner.org$apache$spark$deploy$worker$DriverRunner$$downloadUserJar(DriverRunner.scala:157)
      	at org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:74)
      

      This bug effectively prevents users from using spark-submit in cluster mode to run drivers whose JARs are stored on shared cluster filesystems.

      Attachments

        Activity

          People

            andrewor14 Andrew Or
            joshrosen Josh Rosen
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: