Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21618

http(s) not accepted in spark-submit jar uri

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Duplicate
    • Affects Version/s: 2.1.1, 2.2.0
    • Fix Version/s: None
    • Component/s: Deploy
    • Labels:
    • Environment:

      pre-built for hadoop 2.6 and 2.7 on mac and ubuntu 16.04.

      Description

      The documentation suggests I should be able to use an http(s) uri for a jar in spark-submit, but I haven't been successful https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management

      benmayne@Benjamins-MacBook-Pro ~ $ spark-submit --deploy-mode client --master local[2] --class class.name.Test https://test.com/path/to/jar.jar
      log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
      log4j:WARN Please initialize the log4j system properly.
      log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
      Exception in thread "main" java.io.IOException: No FileSystem for scheme: https
      	at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2586)
      	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2593)
      	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
      	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632)
      	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614)
      	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
      	at org.apache.spark.deploy.SparkSubmit$.downloadFile(SparkSubmit.scala:865)
      	at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$1.apply(SparkSubmit.scala:316)
      	at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$1.apply(SparkSubmit.scala:316)
      	at scala.Option.map(Option.scala:146)
      	at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:316)
      	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153)
      	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
      	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
      benmayne@Benjamins-MacBook-Pro ~ $
      

      If I replace the path with a valid hdfs path (hdfs:///user/benmayne/valid-jar.jar), it works as expected. I've seen the same behavior across 2.2.0 (hadoop 2.6 & 2.7 on mac and ubuntu) and on 2.1.1 on ubuntu.

      this is the example that I'm trying to replicate from https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management:

      > Spark uses the following URL scheme to allow different strategies for disseminating jars:
      > file: - Absolute paths and file:/ URIs are served by the driver’s HTTP file server, and every executor pulls the file from the driver HTTP server.
      > hdfs:, http:, https:, ftp: - these pull down files and JARs from the URI as expected

      # Run on a Mesos cluster in cluster deploy mode with supervise
      ./bin/spark-submit \
        --class org.apache.spark.examples.SparkPi \
        --master mesos://207.184.161.138:7077 \
        --deploy-mode cluster \
        --supervise \
        --executor-memory 20G \
        --total-executor-cores 100 \
        http://path/to/examples.jar \
        1000
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                benmayne Ben Mayne
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: