Thanks Vinod Kumar Vavilapalli for comments!
Users can use URI fragments in distributed-cache - for e.g. hdfs://a/b/c#symlink. Either we should explicitly reject fragments or support them. See MRApps.parseDistributedCacheArtifacts() for similar changes that we do for regular dist-cache files.
IMO, URI fragment doesn't make too much sense here as we are hiding details for how log4j.properties upload to hdfs and get download to each node's local cache and put it on classpath. MR over distributed cache support fragment as user need to explicitly set class path that included in tar ball and symlink for directory is easily to reference. In this case, user only need to set one value here to point to local filesystem. If you still think fragment is required, please file a separated JIRA instead of reopening this.
If the user passes his own log4j file, setting YARN_APP_CONTAINER_LOG_SIZE, YARN_APP_CONTAINER_LOG_BACKUPS and logLevel may not matter at all. But let's set them as you do now, just leave a code comment that setting them may or may not really take effect.
Agree we should put this on documentation.
job.setWorkingDirectory() getting called repeatedly?
As Zhijie's raised above, we decided to address the consolidation of some code flow here in MAPREDUCE-6148.
For the null-scheme input, "File " + file + " does not exist." -> "File " + file + " does not exist on the local file-system."
Nice catch! Will address this minor issue in refactoring JIRA.
Delete this log statement that you might have added for your testing: LOG.debug("default FileSystem: " + jtFs.getUri());
I think it is helpful and prefer to leave it here which doesn't affect system runtime performance. Right?
What happens if the user passes a relative path for MAPREDUCE_JOB_LOG4J_PROPERTIES_FILE? I think it works. If so, let's put this in the documentation too.
Relative path will base on current user's working directory. Let's document it.
Other documentation comments make sense to me. Given most of comments are documentation, let's resolve this and file a separated document JIRA to track.