Details
-
Bug
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
We recently found that when -libjar specified jars on the same remote FS, jars will not be properly added to classpath.
The reason is MAPREDUCE-6719 added the wildcard functionality, but the follow logic assumes files are all placed under job's submission directory. (Inside JobResourceUploader)
if (useWildcard && !foundFragment) { // Add the whole directory to the cache using a wild card Path libJarsDirWildcard = jtFs.makeQualified(new Path(libjarsDir, DistributedCache.WILDCARD)); DistributedCache.addCacheFile(libJarsDirWildcard.toUri(), conf); }
However, in the same method, specified resources will be only uploaded when two FSes are different, see copyRemoteFiles:
if (FileUtil.compareFs(remoteFs, jtFs)) { return originalPath; }
Workaround of this issue is pass:
mapreduce.client.libjars.wildcard = false.
When the MR job got launched.
Example commandline to reproduce this issue is:
hadoop jar abc.jar org.ABC -libjars "wasb://host/path1/jar1,wasb://host/path2/jar2..."
Attachments
Issue Links
- is caused by
-
MAPREDUCE-6719 The list of -libjars archives should be replaced with a wildcard in the distributed cache to reduce the application footprint in the state store
- Resolved