Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Duplicate
-
2.1.1, 2.2.0
-
None
-
pre-built for hadoop 2.6 and 2.7 on mac and ubuntu 16.04.
Description
The documentation suggests I should be able to use an http(s) uri for a jar in spark-submit, but I haven't been successful https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management
benmayne@Benjamins-MacBook-Pro ~ $ spark-submit --deploy-mode client --master local[2] --class class.name.Test https://test.com/path/to/jar.jar log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Exception in thread "main" java.io.IOException: No FileSystem for scheme: https at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2586) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2593) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.spark.deploy.SparkSubmit$.downloadFile(SparkSubmit.scala:865) at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$1.apply(SparkSubmit.scala:316) at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$1.apply(SparkSubmit.scala:316) at scala.Option.map(Option.scala:146) at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:316) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) benmayne@Benjamins-MacBook-Pro ~ $
If I replace the path with a valid hdfs path (hdfs:///user/benmayne/valid-jar.jar), it works as expected. I've seen the same behavior across 2.2.0 (hadoop 2.6 & 2.7 on mac and ubuntu) and on 2.1.1 on ubuntu.
this is the example that I'm trying to replicate from https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management:
> Spark uses the following URL scheme to allow different strategies for disseminating jars:
> file: - Absolute paths and file:/ URIs are served by the driver’s HTTP file server, and every executor pulls the file from the driver HTTP server.
> hdfs:, http:, https:, ftp: - these pull down files and JARs from the URI as expected
# Run on a Mesos cluster in cluster deploy mode with supervise ./bin/spark-submit \ --class org.apache.spark.examples.SparkPi \ --master mesos://207.184.161.138:7077 \ --deploy-mode cluster \ --supervise \ --executor-memory 20G \ --total-executor-cores 100 \ http://path/to/examples.jar \ 1000
Attachments
Issue Links
- Is contained by
-
SPARK-21012 Support glob path for resources adding to Spark
- Resolved