Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
linux ("path.separator" is ":")
hdfs filesystem (not "local")
Description
When passing paths with scheme or port components set up (like
"hdfs://localhost:9000/deploy/hello") to DistributedCache.addFileToClassPath, they are appended to configuration option "mapred.job.classpath.files" using delimeter "path.separator", which is ":".
This misleads DistributedCache.getFileClassPath: same symbol is used to delimete parts of Path and whole paths.
Example:
I have some jars and conf-files in hdfs directory "/deploy". Next code adds them to job's classpath:
Path deployPath = new Path("/deploy"); FileSystem fs = deployPath.getFileSystem(new Configuration()); FileStatus[] jars = fs.listStatus(deployPath); for (int i = 0; i < jars.length; i++) { System.out.println(jars[i].getPath()); DistributedCache.addFileToClassPath(jars[i].getPath(), job); }
Launhing task gives stdout output:
hdfs://localhost:9000/deploy/hello
And "mapred.job.classpath.files" is set to "hdfs://localhost:9000/deploy/hello" by DistributedCache.
And DistributedCache.getFileClassPaths returns incorrect paths like "9000/deploy/hello/home/gudok/Work/test/bin/../conf".
For now, I've solved this problem by submitting Paths without scheme and port ("/deploy/hello").
Other DistributedCache methods need to be reviewed to.