Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
2.3.0
-
None
Description
To reproduce this issue run:
./bin/spark-submit --master mesos://leader.mesos:5050 \
--packages com.github.scopt:scopt_2.11:3.5.0 \
--conf spark.cores.max=8 \
--conf spark.mesos.executor.docker.image=mesosphere/spark:beta-2.1.1-2.2.0-2-hadoop-2.6 \
--conf spark.mesos.executor.docker.forcePullImage=true \
--class S3Job http://s3-us-west-2.amazonaws.com/arand-sandbox-mesosphere/dcos-spark-scala-tests-assembly-0.1-SNAPSHOT.jar \
--readUrl s3n://arand-sandbox-mesosphere/big.txt --writeUrl s3n://arand-sandbox-mesosphere/linecount.out
within a container created with mesosphere/spark:beta-2.1.1-2.2.0-2-hadoop-2.6 image
You will get: "Exception in thread "main" java.io.IOException: No FileSystem for scheme: s3n"
This can be run reproduced with local[*] as well, no need to use mesos, this is not mesos bug.
The specific spark job used above can be found here: https://github.com/mesosphere/spark-build/blob/d5c50e9ae3b1438e0c4ba96ff9f36d5dafb6a466/tests/jobs/scala/src/main/scala/S3Job.scala
Can be built with sbt assembly in that dir.
Using this code : https://gist.github.com/skonto/4f5ff1e5ede864f90b323cc20bf1e1cbat the beginning of the main method...
you get the following output : https://gist.github.com/skonto/d22b8431586b6663ddd720e179030da4
(Use http://s3-eu-west-1.amazonaws.com/fdp-stavros-test/dcos-spark-scala-tests-assembly-0.1-SNAPSHOT.jar to to get the modified job)
The job works fine if --packages is not used.
The commit that introduced this issue is (before that things work as expected):
5800144a54f5c0180ccf67392f32c3e8a51119b1[m -[33m[m SPARK-21012[SUBMIT] Add glob support for resources adding to Spark [32m(5 months ago) [1;34m<jerryshao>[m Thu, 6 Jul 2017 15:32:49 +0800
The exception comes from here: https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java#L3311
https://github.com/apache/spark/pull/18235/files, check line 950, this is where a filesystem is first created.
The Filesystem class is initialized there, before the main of the spark job is launched... the reason is --packages logic uses hadoop libraries to download files....
Maven resolution happens before the app jar and the resolved jars are added to the classpath. So at that moment there is no s3n to add to the static map when the Filesystem static members are first initialized and also filled due to the first FileSystem instance created (SERVICE_FILE_SYSTEMS).
Later in the spark job main where we try to access the s3n filesystem (create a second filesystem) we get the exception (at this point the app jar has the s3n implementation in it and its on the class path but that scheme is not loaded in the static map of the Filesystem class)...
hadoopConf.set("fs.s3n.impl.disable.cache", "true") has no effect since the problem is with the static map which is filled once and only once.
That's why we see two prints of the map contents in the output(gist) above when --packages is used. The first print is before creating the s3n filesystem. We use reflection there to get the static map's entries. When --packages is not used that map is empty before creating the s3n filesystem since up to that point the Filesystem class is not yet loaded by the classloader.