[SPARK-22587] Spark job fails if fs.defaultFS and application jar are different url - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.6.3
Fix Version/s: 2.3.0
Component/s: Spark Submit
Labels:
None

Description

Spark Job fails if the fs.defaultFs and url where application jar resides are different and having same scheme,

spark-submit --conf spark.master=yarn-cluster wasb://XXX/tmp/test.py

core-site.xml fs.defaultFS is set to wasb:///YYY. Hadoop list works (hadoop fs -ls) works for both the url XXX and YYY.

Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: wasb://XXX/tmp/test.py, expected: wasb://YYY 
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:665) 
at org.apache.hadoop.fs.azure.NativeAzureFileSystem.checkPath(NativeAzureFileSystem.java:1251) 
at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:485) 
at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:396) 
at org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:507) 
at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:660) 
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:912) 
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:172) 
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1248) 
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1307) 
at org.apache.spark.deploy.yarn.Client.main(Client.scala) 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
at java.lang.reflect.Method.invoke(Method.java:498) 
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:751) 
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) 
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) 
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) 
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

The code Client.copyFileToRemote tries to resolve the path of application jar (XXX) from the FileSystem object created using fs.defaultFS url (YYY) instead of the actual url of application jar.

val destFs = destDir.getFileSystem(hadoopConf)
val srcFs = srcPath.getFileSystem(hadoopConf)

getFileSystem will create the filesystem based on the url of the path and so this is fine. But the below lines of code tries to get the srcPath (XXX url) from the destFs (YYY url) and so it fails.

var destPath = srcPath
val qualifiedDestPath = destFs.makeQualified(destPath)

Attachments

Issue Links

is related to

HADOOP-15094 FileSystem.getCanonicalUri() to be public

Open

HADOOP-15070 add test to verify FileSystem and paths differentiate on user info

Resolved

links to

[Github] Pull Request #19885 (merlintang)

Activity

People

Assignee:: Mingjie Tang

Reporter:: Prabhu Joseph

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 23/Nov/17 06:17

Updated:: 23/Oct/19 16:15

Resolved:: 11/Jan/18 04:03