[SPARK-21618] http(s) not accepted in spark-submit jar uri - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Duplicate
Affects Version/s: 2.1.1, 2.2.0
Fix Version/s: None
Component/s: Deploy
Labels:
- documentation
Environment:

pre-built for hadoop 2.6 and 2.7 on mac and ubuntu 16.04.

Description

The documentation suggests I should be able to use an http(s) uri for a jar in spark-submit, but I haven't been successful https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management

benmayne@Benjamins-MacBook-Pro ~ $ spark-submit --deploy-mode client --master local[2] --class class.name.Test https://test.com/path/to/jar.jar
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" java.io.IOException: No FileSystem for scheme: https
	at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2586)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2593)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
	at org.apache.spark.deploy.SparkSubmit$.downloadFile(SparkSubmit.scala:865)
	at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$1.apply(SparkSubmit.scala:316)
	at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$1.apply(SparkSubmit.scala:316)
	at scala.Option.map(Option.scala:146)
	at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:316)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
benmayne@Benjamins-MacBook-Pro ~ $

If I replace the path with a valid hdfs path (hdfs:///user/benmayne/valid-jar.jar), it works as expected. I've seen the same behavior across 2.2.0 (hadoop 2.6 & 2.7 on mac and ubuntu) and on 2.1.1 on ubuntu.

this is the example that I'm trying to replicate from https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management:

> Spark uses the following URL scheme to allow different strategies for disseminating jars:
> file: - Absolute paths and file:/ URIs are served by the driver’s HTTP file server, and every executor pulls the file from the driver HTTP server.
> hdfs:, http:, https:, ftp: - these pull down files and JARs from the URI as expected

# Run on a Mesos cluster in cluster deploy mode with supervise
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master mesos://207.184.161.138:7077 \
  --deploy-mode cluster \
  --supervise \
  --executor-memory 20G \
  --total-executor-cores 100 \
  http://path/to/examples.jar \
  1000

Attachments

Issue Links

Is contained by

SPARK-21012 Support glob path for resources adding to Spark

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Ben Mayne

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 03/Aug/17 01:13

Updated:: 08/Aug/17 20:06

Resolved:: 03/Aug/17 22:40