Affects Version/s: None
Fix Version/s: None
linux ("path.separator" is ":")
hdfs filesystem (not "local")
When passing paths with scheme or port components set up (like
"hdfs://localhost:9000/deploy/hello") to DistributedCache.addFileToClassPath, they are appended to configuration option "mapred.job.classpath.files" using delimeter "path.separator", which is ":".
This misleads DistributedCache.getFileClassPath: same symbol is used to delimete parts of Path and whole paths.
I have some jars and conf-files in hdfs directory "/deploy". Next code adds them to job's classpath:
Launhing task gives stdout output:
And "mapred.job.classpath.files" is set to "hdfs://localhost:9000/deploy/hello" by DistributedCache.
And DistributedCache.getFileClassPaths returns incorrect paths like "9000/deploy/hello/home/gudok/Work/test/bin/../conf".
For now, I've solved this problem by submitting Paths without scheme and port ("/deploy/hello").
Other DistributedCache methods need to be reviewed to.