Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.5.0
    • Fix Version/s: 2.6.0
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      For current MR application, the "log4j.configuration" is hard coded to container-log4j.properties within each node. We still need flexibility to override it per job like what we do in MRV1.

        public static void addLog4jSystemProperties(
            String logLevel, long logSize, int numBackups, List<String> vargs) {
          vargs.add("-Dlog4j.configuration=container-log4j.properties");
      
      1. MAPREDUCE-6052.patch
        7 kB
        Junping Du
      2. MAPREDUCE-6052-v2.patch
        8 kB
        Junping Du
      3. MAPREDUCE-6052-v3.patch
        15 kB
        Junping Du
      4. MAPREDUCE-6052-v4.patch
        15 kB
        Junping Du

        Issue Links

          Activity

          Hide
          Junping Du added a comment -

          Attache a quick patch to fix it.

          Show
          Junping Du added a comment - Attache a quick patch to fix it.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12664543/MAPREDUCE-6052.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4828//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4828//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12664543/MAPREDUCE-6052.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4828//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4828//console This message is automatically generated.
          Hide
          Xuan Gong added a comment -

          Thanks for the patch, Junping Du.
          Overall looks good. But I still have one question:
          Currently, we have the same container-log4j.properties for all NMs. No matter which NM the containers will be launched, they will always get the same container-log4j.properties. So, if we use customized log4j properties, do we need to distribute this file to all the NMs (just like the default container-log4j.properties) ? Otherwise, If we use the customized log4j properties which exist in local file system, I am afraid that for other containers which are launched in other NMs can not get this customized log4j properties.
          Maybe you could test the patch in multi-node cluster to verify it.

          Show
          Xuan Gong added a comment - Thanks for the patch, Junping Du . Overall looks good. But I still have one question: Currently, we have the same container-log4j.properties for all NMs. No matter which NM the containers will be launched, they will always get the same container-log4j.properties. So, if we use customized log4j properties, do we need to distribute this file to all the NMs (just like the default container-log4j.properties) ? Otherwise, If we use the customized log4j properties which exist in local file system, I am afraid that for other containers which are launched in other NMs can not get this customized log4j properties. Maybe you could test the patch in multi-node cluster to verify it.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          No matter which NM the containers will be launched, they will always get the same container-log4j.properties. So, if we use customized log4j properties, do we need to distribute this file to all the NMs (just like the default container-log4j.properties) ?

          Was asked to comment. This makes sense. Let's just make it a user-configuration of a local log4j file that we accept and add it automatically to the distributed-cache.

          Show
          Vinod Kumar Vavilapalli added a comment - No matter which NM the containers will be launched, they will always get the same container-log4j.properties. So, if we use customized log4j properties, do we need to distribute this file to all the NMs (just like the default container-log4j.properties) ? Was asked to comment. This makes sense. Let's just make it a user-configuration of a local log4j file that we accept and add it automatically to the distributed-cache.
          Hide
          Junping Du added a comment -

          Was asked to comment. This makes sense. Let's just make it a user-configuration of a local log4j file that we accept and add it automatically to the distributed-cache.

          Hi Vinod Kumar Vavilapalli, can you clarify more on this? I think user can explicitly use the configuration of "-files log4j.properties" to add file to distributed cached and deliver to each node. Right? So may be we should do here is to add the parent directory of dc files to classpath by default. Attaching a patch to demonstrate this.

          Show
          Junping Du added a comment - Was asked to comment. This makes sense. Let's just make it a user-configuration of a local log4j file that we accept and add it automatically to the distributed-cache. Hi Vinod Kumar Vavilapalli , can you clarify more on this? I think user can explicitly use the configuration of "-files log4j.properties" to add file to distributed cached and deliver to each node. Right? So may be we should do here is to add the parent directory of dc files to classpath by default. Attaching a patch to demonstrate this.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12676100/MAPREDUCE-6052-v2.patch
          against trunk revision 171f237.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4972//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4972//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12676100/MAPREDUCE-6052-v2.patch against trunk revision 171f237. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4972//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4972//console This message is automatically generated.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Hi Vinod Kumar Vavilapalli, can you clarify more on this? I think user can explicitly use the configuration of "-files log4j.properties" to add file to distributed cached and deliver to each node. Right? So may be we should do here is to add the parent directory of dc files to classpath by default. Attaching a patch to demonstrate this.

          It's weird that we are only letting users configure the name of the log4j config file but not letting them specify what the file itself is. I'd rather have an option to simply say "hey this is my log4j configuration file on HDFS, go use it".

          Show
          Vinod Kumar Vavilapalli added a comment - Hi Vinod Kumar Vavilapalli, can you clarify more on this? I think user can explicitly use the configuration of "-files log4j.properties" to add file to distributed cached and deliver to each node. Right? So may be we should do here is to add the parent directory of dc files to classpath by default. Attaching a patch to demonstrate this. It's weird that we are only letting users configure the name of the log4j config file but not letting them specify what the file itself is. I'd rather have an option to simply say "hey this is my log4j configuration file on HDFS, go use it".
          Hide
          Vinod Kumar Vavilapalli added a comment -

          "hey this is my log4j configuration file on HDFS, go use it"

          That or "hey this is my log4j configuration file on the local-disk, go add it to dist-cache and use it for all my tasks".

          Show
          Vinod Kumar Vavilapalli added a comment - "hey this is my log4j configuration file on HDFS, go use it" That or "hey this is my log4j configuration file on the local-disk, go add it to dist-cache and use it for all my tasks".
          Hide
          Junping Du added a comment -

          That or "hey this is my log4j configuration file on the local-disk, go add it to dist-cache and use it for all my tasks".

          If so, user can add "-files log4j.properties" as generic option which can distribute local file to all target hosts. Right?

          Show
          Junping Du added a comment - That or "hey this is my log4j configuration file on the local-disk, go add it to dist-cache and use it for all my tasks". If so, user can add "-files log4j.properties" as generic option which can distribute local file to all target hosts. Right?
          Hide
          Vinod Kumar Vavilapalli added a comment -

          With your proposal, user has to set (1) log4j.configuration to the log-file name (a very weird configuration) and then (2) explicitly add the log-file to distributed cache.

          I am proposing that we simply have (0) mapreduce.job.log4j-configuration-file set to file:///home/vinodkv/container-log4j.properties#my-short-name which is then recognized by JobClient, automatically uploaded to HDFS similar to job.jar if it is a local file, and also added to distributed cache.

          Show
          Vinod Kumar Vavilapalli added a comment - With your proposal, user has to set (1) log4j.configuration to the log-file name (a very weird configuration) and then (2) explicitly add the log-file to distributed cache. I am proposing that we simply have (0) mapreduce.job.log4j-configuration-file set to file:///home/vinodkv/container-log4j.properties#my-short-name which is then recognized by JobClient, automatically uploaded to HDFS similar to job.jar if it is a local file, and also added to distributed cache.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          My proposal in concrete

          • Add a new config mapreduce.job.log4j-properties-uri or mapreduce.job.log4j-properties-file.
          • JobClient adds file this to distributed-cache, as a class-path file before submission. The 'key' in distributed-cache is the same URI.
            • If mapreduce.job.log4j-properties-uri is a local file-system URI, the file automatically gets uploaded to HDFS and then gets distributed everywhere.
            • If it is a HDFS location, it is simply distributed everywhere via dist-cache.
          • MR AM reads the config property, and if it is set, appends "-Dlog4j.configuration=$(file-name)". Here "file-name" is the path component of uri or a URI fragment if present (this is the convention for distribute-cache too).
          Show
          Vinod Kumar Vavilapalli added a comment - My proposal in concrete Add a new config mapreduce.job.log4j-properties-uri or mapreduce.job.log4j-properties-file . JobClient adds file this to distributed-cache, as a class-path file before submission. The 'key' in distributed-cache is the same URI. If mapreduce.job.log4j-properties-uri is a local file-system URI, the file automatically gets uploaded to HDFS and then gets distributed everywhere. If it is a HDFS location, it is simply distributed everywhere via dist-cache. MR AM reads the config property, and if it is set, appends "-Dlog4j.configuration=$(file-name)". Here "file-name" is the path component of uri or a URI fragment if present (this is the convention for distribute-cache too).
          Hide
          Junping Du added a comment -

          Sounds good. In v3 patch, implement most as proposed above but updating JobSubmitter instead of JobClient as new mapreduce API (package with mapreduce.*) is more important for new feature.

          Show
          Junping Du added a comment - Sounds good. In v3 patch, implement most as proposed above but updating JobSubmitter instead of JobClient as new mapreduce API (package with mapreduce.*) is more important for new feature.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12677109/MAPREDUCE-6052-v3.patch
          against trunk revision f44cf99.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4980//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4980//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677109/MAPREDUCE-6052-v3.patch against trunk revision f44cf99. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4980//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4980//console This message is automatically generated.
          Hide
          Zhijie Shen added a comment -

          Junping Du. thanks for the patch. In general, the patch looks fine and implements Vinod's proposal. Some comments.

          1. The method has javadoc. It's good to add one line for "conf".

            public static void addLog4jSystemProperties(
          

          2. Unnecessary import in JobSubmitter

          3. It seems that ClientDistributedCacheManager blah blah doesn't need to be moved. We can invoke addLog4jToDistributedCache inside copyAndConfigureFiles.

                addLog4jToDistributedCache(job, submitJobDir);
                
                //  set the timestamps of the archives and files
                //  set the public/private visibility of the archives and files
                ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(conf);
                // get DelegationToken for cached file
                ClientDistributedCacheManager.getDelegationTokens(conf, job
                    .getCredentials());
           

          4. Return null too? Otherwise, mapreduce.job.log4j-properties-file = "" will result in exception?

              if (file.isEmpty()) {
                throw new IllegalArgumentException("File name can't be empty string");
              }
          

          5. The comment is staled?

              // first copy local log4j.properties file to jobtrackers filesystem
          

          6. It seem not to be necessary because submitJobDir should have been created in copyAndConfigureFiles

              LOG.debug("default FileSystem: " + jtFs.getUri());
              FsPermission mapredSysPerms = 
                new FsPermission(JobSubmissionFiles.JOB_DIR_PERMISSION);
              if (!jtFs.exists(submitJobDir)) {
                submitJobDir = jtFs.makeQualified(submitJobDir);
                submitJobDir = new Path(submitJobDir.toUri().getPath());
                FileSystem.mkdirs(jtFs, submitJobDir, mapredSysPerms);
              }
          

          BTW, would you please comment how the patch works if you've conducted end-to-end local test?

          Show
          Zhijie Shen added a comment - Junping Du . thanks for the patch. In general, the patch looks fine and implements Vinod's proposal. Some comments. 1. The method has javadoc. It's good to add one line for "conf". public static void addLog4jSystemProperties( 2. Unnecessary import in JobSubmitter 3. It seems that ClientDistributedCacheManager blah blah doesn't need to be moved. We can invoke addLog4jToDistributedCache inside copyAndConfigureFiles. addLog4jToDistributedCache(job, submitJobDir); // set the timestamps of the archives and files // set the public / private visibility of the archives and files ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(conf); // get DelegationToken for cached file ClientDistributedCacheManager.getDelegationTokens(conf, job .getCredentials()); 4. Return null too? Otherwise, mapreduce.job.log4j-properties-file = "" will result in exception? if (file.isEmpty()) { throw new IllegalArgumentException( "File name can't be empty string" ); } 5. The comment is staled? // first copy local log4j.properties file to jobtrackers filesystem 6. It seem not to be necessary because submitJobDir should have been created in copyAndConfigureFiles LOG.debug( " default FileSystem: " + jtFs.getUri()); FsPermission mapredSysPerms = new FsPermission(JobSubmissionFiles.JOB_DIR_PERMISSION); if (!jtFs.exists(submitJobDir)) { submitJobDir = jtFs.makeQualified(submitJobDir); submitJobDir = new Path(submitJobDir.toUri().getPath()); FileSystem.mkdirs(jtFs, submitJobDir, mapredSysPerms); } BTW, would you please comment how the patch works if you've conducted end-to-end local test?
          Hide
          Junping Du added a comment -

          Thanks for review and comments, Zhijie Shen!

          The method has javadoc. It's good to add one line for "conf".

          Nice catch, fix it in v4 patch.

          Unnecessary import in JobSubmitter

          I don't see any. Only some deprecated. Would you double check and point it out here?

          It seems that ClientDistributedCacheManager blah blah doesn't need to be moved. We can invoke addLog4jToDistributedCache inside copyAndConfigureFiles.

          That's a good proposal. Update in v4 patch.

          Return null too? Otherwise, mapreduce.job.log4j-properties-file = "" will result in exception?

          No. mapreduce.job.log4j-properties-file is default to be empty. However, we already check this property is not empty before calling to this method, the method here should make sure the path is valid for local and adding with complete scheme.

          The comment is staled? ... It seem not to be necessary because submitJobDir should have been created in copyAndConfigureFiles

          Good points. Update both.

          BTW, would you please comment how the patch works if you've conducted end-to-end local test?
          I tried the patch in my local cluster setup, steps are like this:
          1. adding a local log4j.properties file path to the property of "mapreduce.job.log4j-properties-file" in mapred-site.xml
          2. submit a mr job. You can find that this file get deliver to each NM's localized directory, and works to change the log format for each container of this job.

          Show
          Junping Du added a comment - Thanks for review and comments, Zhijie Shen ! The method has javadoc. It's good to add one line for "conf". Nice catch, fix it in v4 patch. Unnecessary import in JobSubmitter I don't see any. Only some deprecated. Would you double check and point it out here? It seems that ClientDistributedCacheManager blah blah doesn't need to be moved. We can invoke addLog4jToDistributedCache inside copyAndConfigureFiles. That's a good proposal. Update in v4 patch. Return null too? Otherwise, mapreduce.job.log4j-properties-file = "" will result in exception? No. mapreduce.job.log4j-properties-file is default to be empty. However, we already check this property is not empty before calling to this method, the method here should make sure the path is valid for local and adding with complete scheme. The comment is staled? ... It seem not to be necessary because submitJobDir should have been created in copyAndConfigureFiles Good points. Update both. BTW, would you please comment how the patch works if you've conducted end-to-end local test? I tried the patch in my local cluster setup, steps are like this: 1. adding a local log4j.properties file path to the property of "mapreduce.job.log4j-properties-file" in mapred-site.xml 2. submit a mr job. You can find that this file get deliver to each NM's localized directory, and works to change the log format for each container of this job.
          Hide
          Junping Du added a comment -

          I don't see any. Only some deprecated. Would you double check and point it out here?

          Double check again, removed it in v4 patch.

          Show
          Junping Du added a comment - I don't see any. Only some deprecated. Would you double check and point it out here? Double check again, removed it in v4 patch.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12678436/MAPREDUCE-6052-v4.patch
          against trunk revision d1828d9.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4989//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4989//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12678436/MAPREDUCE-6052-v4.patch against trunk revision d1828d9. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4989//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4989//console This message is automatically generated.
          Hide
          Zhijie Shen added a comment -

          The call chain is: copyAndConfigureFiles(2 args) -> copyAndConfigureFiles(3 args) -> addLog4jToDistributedCache -> copyLog4jPropertyFile

          1. In copyAndConfigureFiles, the following code has bee executed before. We don't need to invoke them again in addLog4jToDistributedCache. and we can combine addLog4jToDistributedCache and copyLog4jPropertyFile into one method.

                // Set the working directory
                if (job.getWorkingDirectory() == null) {
                  job.setWorkingDirectory(jtFs.getWorkingDirectory());
                }
          
                short replication = (short)conf.getInt(Job.SUBMIT_REPLICATION, 10);
          

          2. The same bunch of code is already executed before in copyAndConfigureFiles(3 args). No need to repeat in copyLog4jPropertyFile, and mapredSysPerms can be passed as an arg.

              LOG.debug("default FileSystem: " + jtFs.getUri());
              FsPermission mapredSysPerms = 
                new FsPermission(JobSubmissionFiles.JOB_DIR_PERMISSION);
              if (!jtFs.exists(submitJobDir)) {
                throw new IOException("Cannot find job submission directory! " 
                    + "It should just be created, so something wrong here.");
              }
          

          3. FileSystem localFs = FileSystem.getLocal(conf); only make sense to the if condition is true.

          FileSystem localFs = FileSystem.getLocal(conf);
              if (pathURI.getScheme() == null) {
                //default to the local file system
                //check if the file exists or not first
                if (!localFs.exists(path)) {
                  throw new FileNotFoundException("File " + file + " does not exist.");
                }
                finalPath = path.makeQualified(localFs.getUri(),
                    localFs.getWorkingDirectory()).toString();
              }
              else {
                // check if the file exists in this file system
                // we need to recreate this filesystem object to copy
                // these files to the file system ResourceManager is running
                // on.
                FileSystem fs = path.getFileSystem(conf);
                if (!fs.exists(path)) {
                  throw new FileNotFoundException("File " + file + " does not exist.");
                }
                finalPath = path.makeQualified(fs.getUri(),
                    fs.getWorkingDirectory()).toString();
              }
          

          I still have one question about a corner case:

          1. Say if we set mapreduce.job.log4j-properties-file=blah/blah/container-log4j.properties, we're going to add the param -Dlog4j.configuration=container-log4j.properties, right? So are we going to use the default container-log4j.properties or the uploaded one? Will the default container-log4j.properties be overwritten?

          Show
          Zhijie Shen added a comment - The call chain is: copyAndConfigureFiles(2 args) -> copyAndConfigureFiles(3 args) -> addLog4jToDistributedCache -> copyLog4jPropertyFile 1. In copyAndConfigureFiles, the following code has bee executed before. We don't need to invoke them again in addLog4jToDistributedCache. and we can combine addLog4jToDistributedCache and copyLog4jPropertyFile into one method. // Set the working directory if (job.getWorkingDirectory() == null ) { job.setWorkingDirectory(jtFs.getWorkingDirectory()); } short replication = ( short )conf.getInt(Job.SUBMIT_REPLICATION, 10); 2. The same bunch of code is already executed before in copyAndConfigureFiles(3 args). No need to repeat in copyLog4jPropertyFile, and mapredSysPerms can be passed as an arg. LOG.debug( " default FileSystem: " + jtFs.getUri()); FsPermission mapredSysPerms = new FsPermission(JobSubmissionFiles.JOB_DIR_PERMISSION); if (!jtFs.exists(submitJobDir)) { throw new IOException( "Cannot find job submission directory! " + "It should just be created, so something wrong here." ); } 3. FileSystem localFs = FileSystem.getLocal(conf); only make sense to the if condition is true. FileSystem localFs = FileSystem.getLocal(conf); if (pathURI.getScheme() == null ) { // default to the local file system //check if the file exists or not first if (!localFs.exists(path)) { throw new FileNotFoundException( "File " + file + " does not exist." ); } finalPath = path.makeQualified(localFs.getUri(), localFs.getWorkingDirectory()).toString(); } else { // check if the file exists in this file system // we need to recreate this filesystem object to copy // these files to the file system ResourceManager is running // on. FileSystem fs = path.getFileSystem(conf); if (!fs.exists(path)) { throw new FileNotFoundException( "File " + file + " does not exist." ); } finalPath = path.makeQualified(fs.getUri(), fs.getWorkingDirectory()).toString(); } I still have one question about a corner case: 1. Say if we set mapreduce.job.log4j-properties-file=blah/blah/container-log4j.properties, we're going to add the param -Dlog4j.configuration=container-log4j.properties , right? So are we going to use the default container-log4j.properties or the uploaded one? Will the default container-log4j.properties be overwritten?
          Hide
          Junping Du added a comment -

          In copyAndConfigureFiles, the following code has bee executed before. We don't need to invoke them again in addLog4jToDistributedCache. and we can combine addLog4jToDistributedCache and copyLog4jPropertyFile into one method.

          Let's keep it here. We do can have flexibility in future for switching job working directory or changing replication number. Also, combining two methods don't have significant benefits, let's along existing copyAndConfigureFiles() except we have strong justification for change.

          The same bunch of code is already executed before in copyAndConfigureFiles(3 args). No need to repeat in copyLog4jPropertyFile, and mapredSysPerms can be passed as an arg.

          It is safety to keep checking directory existing before using it given the possible refactoring or other operations in parallel.

          FileSystem localFs = FileSystem.getLocal(conf); only make sense to the if condition is true.

          Nice catch! However, given this is a very minor one and this is a critical issue in 2.6 release which is pretty closed to end. Can we refactor it later?

          I still have one question about a corner case:

          That's a good question. In your case, there will be a classic scenario: two files with the same name on classpath (one is loading in conf directory by default, the other is ), which one would take effective? It actually depends on which file appear on classpath earlier (http://stackoverflow.com/questions/6644440/java-which-of-multiple-resources-on-classpath-jvm-takes), but not sure if all JVMs conform the same spec on this. So the best way to do (it is also what we should recommend to user/customer) is to use a different name (like: log4j.properties in most cases) other than the default one.

          Show
          Junping Du added a comment - In copyAndConfigureFiles, the following code has bee executed before. We don't need to invoke them again in addLog4jToDistributedCache. and we can combine addLog4jToDistributedCache and copyLog4jPropertyFile into one method. Let's keep it here. We do can have flexibility in future for switching job working directory or changing replication number. Also, combining two methods don't have significant benefits, let's along existing copyAndConfigureFiles() except we have strong justification for change. The same bunch of code is already executed before in copyAndConfigureFiles(3 args). No need to repeat in copyLog4jPropertyFile, and mapredSysPerms can be passed as an arg. It is safety to keep checking directory existing before using it given the possible refactoring or other operations in parallel. FileSystem localFs = FileSystem.getLocal(conf); only make sense to the if condition is true. Nice catch! However, given this is a very minor one and this is a critical issue in 2.6 release which is pretty closed to end. Can we refactor it later? I still have one question about a corner case: That's a good question. In your case, there will be a classic scenario: two files with the same name on classpath (one is loading in conf directory by default, the other is ), which one would take effective? It actually depends on which file appear on classpath earlier ( http://stackoverflow.com/questions/6644440/java-which-of-multiple-resources-on-classpath-jvm-takes ), but not sure if all JVMs conform the same spec on this. So the best way to do (it is also what we should recommend to user/customer) is to use a different name (like: log4j.properties in most cases) other than the default one.
          Hide
          Zhijie Shen added a comment -

          Let's keep it here. We do can have flexibility in future for switching job working directory or changing replication number. Also, combining two methods don't have significant benefits, let's along existing copyAndConfigureFiles() except we have strong justification for change.

          It is safety to keep checking directory existing before using it given the possible refactoring or other operations in parallel.

          There's unnecessary code duplication though executing it multiple times doesn't result in bug, which is better to be cleanup. In fact, the replication number and the working dir is read once, and used for all the following files/jars in distributed caches. It's should have no problem to reuse it for log4j. And during copying files/jars, this configured replication number and working dir shouldn't be changed, and don't make sense to be concurrently modified by others.

          After the redundant code is removed, one method contains so few statements that can be put directly into the other one, but I'm okay if you want to keep separate them.

          two files with the same name on classpath (one is loading in conf directory by default, the other is ), which one would take effective?

          If so, probably we are going to still use the default container-log4j.properties because it comes with hadoop-yarn-server-nodemanager-2.x.y.jar. It will be good if we can document this issue somewhere, in case we run into this issue later.

          Show
          Zhijie Shen added a comment - Let's keep it here. We do can have flexibility in future for switching job working directory or changing replication number. Also, combining two methods don't have significant benefits, let's along existing copyAndConfigureFiles() except we have strong justification for change. It is safety to keep checking directory existing before using it given the possible refactoring or other operations in parallel. There's unnecessary code duplication though executing it multiple times doesn't result in bug, which is better to be cleanup. In fact, the replication number and the working dir is read once, and used for all the following files/jars in distributed caches. It's should have no problem to reuse it for log4j. And during copying files/jars, this configured replication number and working dir shouldn't be changed, and don't make sense to be concurrently modified by others. After the redundant code is removed, one method contains so few statements that can be put directly into the other one, but I'm okay if you want to keep separate them. two files with the same name on classpath (one is loading in conf directory by default, the other is ), which one would take effective? If so, probably we are going to still use the default container-log4j.properties because it comes with hadoop-yarn-server-nodemanager-2.x.y.jar. It will be good if we can document this issue somewhere, in case we run into this issue later.
          Hide
          Zhijie Shen added a comment -

          Have a quick offline chat shortly with Junping. As there already exist code duplication between copying regular files and tar ball files, let's just leave the patch as what it is not to move fast for 2.6. Will file a separate code-refactoring Jira to consolidate copying of regular files(jars)/log4j file/tar balls.

          Will go ahead to commit the current patch.

          Show
          Zhijie Shen added a comment - Have a quick offline chat shortly with Junping. As there already exist code duplication between copying regular files and tar ball files, let's just leave the patch as what it is not to move fast for 2.6. Will file a separate code-refactoring Jira to consolidate copying of regular files(jars)/log4j file/tar balls. Will go ahead to commit the current patch.
          Hide
          Zhijie Shen added a comment -

          Committed to trunk, branch-2 and branch-2.6. Thanks, Junping! Also thanks contributing ur ideas, Xuan and Vinod!

          Show
          Zhijie Shen added a comment - Committed to trunk, branch-2 and branch-2.6. Thanks, Junping! Also thanks contributing ur ideas, Xuan and Vinod!
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #6419 (See https://builds.apache.org/job/Hadoop-trunk-Commit/6419/)
          MAPREDUCE-6052. Supported overriding the default container-log4j.properties file per job. Contributed by Junping Du. (zjshen: rev ed63b116465290fdb0acdf89170025f47b307599)

          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestMapReduceChildJVM.java
          • hadoop-mapreduce-project/CHANGES.txt
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/MapReduceChildJVM.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmissionFiles.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/YARNRunner.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #6419 (See https://builds.apache.org/job/Hadoop-trunk-Commit/6419/ ) MAPREDUCE-6052 . Supported overriding the default container-log4j.properties file per job. Contributed by Junping Du. (zjshen: rev ed63b116465290fdb0acdf89170025f47b307599) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestMapReduceChildJVM.java hadoop-mapreduce-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/MapReduceChildJVM.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmissionFiles.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/YARNRunner.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk #730 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/730/)
          MAPREDUCE-6052. Supported overriding the default container-log4j.properties file per job. Contributed by Junping Du. (zjshen: rev ed63b116465290fdb0acdf89170025f47b307599)

          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/YARNRunner.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/MapReduceChildJVM.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestMapReduceChildJVM.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java
          • hadoop-mapreduce-project/CHANGES.txt
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmissionFiles.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #730 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/730/ ) MAPREDUCE-6052 . Supported overriding the default container-log4j.properties file per job. Contributed by Junping Du. (zjshen: rev ed63b116465290fdb0acdf89170025f47b307599) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/YARNRunner.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/MapReduceChildJVM.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestMapReduceChildJVM.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java hadoop-mapreduce-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmissionFiles.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Hdfs-trunk #1919 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1919/)
          MAPREDUCE-6052. Supported overriding the default container-log4j.properties file per job. Contributed by Junping Du. (zjshen: rev ed63b116465290fdb0acdf89170025f47b307599)

          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/YARNRunner.java
          • hadoop-mapreduce-project/CHANGES.txt
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/MapReduceChildJVM.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestMapReduceChildJVM.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmissionFiles.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk #1919 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1919/ ) MAPREDUCE-6052 . Supported overriding the default container-log4j.properties file per job. Contributed by Junping Du. (zjshen: rev ed63b116465290fdb0acdf89170025f47b307599) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/YARNRunner.java hadoop-mapreduce-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/MapReduceChildJVM.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestMapReduceChildJVM.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmissionFiles.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #1944 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1944/)
          MAPREDUCE-6052. Supported overriding the default container-log4j.properties file per job. Contributed by Junping Du. (zjshen: rev ed63b116465290fdb0acdf89170025f47b307599)

          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/YARNRunner.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java
          • hadoop-mapreduce-project/CHANGES.txt
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestMapReduceChildJVM.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/MapReduceChildJVM.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmissionFiles.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1944 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1944/ ) MAPREDUCE-6052 . Supported overriding the default container-log4j.properties file per job. Contributed by Junping Du. (zjshen: rev ed63b116465290fdb0acdf89170025f47b307599) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/YARNRunner.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java hadoop-mapreduce-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestMapReduceChildJVM.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/MapReduceChildJVM.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmissionFiles.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java
          Hide
          Vinod Kumar Vavilapalli added a comment - - edited

          I am reopening this JIRA for few issues, please post an addendum patch.

          Code

          • Users can use URI fragments in distributed-cache - for e.g. hdfs://a/b/c#symlink. Either we should explicitly reject fragments or support them. See MRApps.parseDistributedCacheArtifacts() for similar changes that we do for regular dist-cache files.
          • MRApps.addLog4jSystemProperties(): If the user passes his own log4j file, setting YARN_APP_CONTAINER_LOG_SIZE, YARN_APP_CONTAINER_LOG_BACKUPS and logLevel may not matter at all. But let's set them as you do now, just leave a code comment that setting them may or may not really take effect.
          • job.setWorkingDirectory() getting called repeatedly?
          • For the null-scheme input, "File " + file + " does not exist." -> "File " + file + " does not exist on the local file-system."
          • Delete this log statement that you might have added for your testing: LOG.debug("default FileSystem: " + jtFs.getUri());
          • What happens if the user passes a relative path for MAPREDUCE_JOB_LOG4J_PROPERTIES_FILE? I think it works. If so, let's put this in the documentation too.

          Documentation

          • Document the new config in mapred-default.xml
          • Mention in that documentation that if no-scheme is given in the path, it defaults to a log4j file on the local FS.
          • Modify the documentation of log-level configs to say that if you override to have your own log4j.properties file, the log-level configs may not work
          Show
          Vinod Kumar Vavilapalli added a comment - - edited I am reopening this JIRA for few issues, please post an addendum patch. Code Users can use URI fragments in distributed-cache - for e.g. hdfs://a/b/c#symlink. Either we should explicitly reject fragments or support them. See MRApps.parseDistributedCacheArtifacts() for similar changes that we do for regular dist-cache files. MRApps.addLog4jSystemProperties(): If the user passes his own log4j file, setting YARN_APP_CONTAINER_LOG_SIZE, YARN_APP_CONTAINER_LOG_BACKUPS and logLevel may not matter at all. But let's set them as you do now, just leave a code comment that setting them may or may not really take effect. job.setWorkingDirectory() getting called repeatedly? For the null-scheme input, "File " + file + " does not exist." -> "File " + file + " does not exist on the local file-system." Delete this log statement that you might have added for your testing: LOG.debug("default FileSystem: " + jtFs.getUri()); What happens if the user passes a relative path for MAPREDUCE_JOB_LOG4J_PROPERTIES_FILE? I think it works. If so, let's put this in the documentation too. Documentation Document the new config in mapred-default.xml Mention in that documentation that if no-scheme is given in the path, it defaults to a log4j file on the local FS. Modify the documentation of log-level configs to say that if you override to have your own log4j.properties file, the log-level configs may not work
          Hide
          Junping Du added a comment -

          Thanks Vinod Kumar Vavilapalli for comments!

          Users can use URI fragments in distributed-cache - for e.g. hdfs://a/b/c#symlink. Either we should explicitly reject fragments or support them. See MRApps.parseDistributedCacheArtifacts() for similar changes that we do for regular dist-cache files.

          IMO, URI fragment doesn't make too much sense here as we are hiding details for how log4j.properties upload to hdfs and get download to each node's local cache and put it on classpath. MR over distributed cache support fragment as user need to explicitly set class path that included in tar ball and symlink for directory is easily to reference. In this case, user only need to set one value here to point to local filesystem. If you still think fragment is required, please file a separated JIRA instead of reopening this.

          If the user passes his own log4j file, setting YARN_APP_CONTAINER_LOG_SIZE, YARN_APP_CONTAINER_LOG_BACKUPS and logLevel may not matter at all. But let's set them as you do now, just leave a code comment that setting them may or may not really take effect.

          Agree we should put this on documentation.

          job.setWorkingDirectory() getting called repeatedly?

          As Zhijie's raised above, we decided to address the consolidation of some code flow here in MAPREDUCE-6148.

          For the null-scheme input, "File " + file + " does not exist." -> "File " + file + " does not exist on the local file-system."

          Nice catch! Will address this minor issue in refactoring JIRA.

          Delete this log statement that you might have added for your testing: LOG.debug("default FileSystem: " + jtFs.getUri());

          I think it is helpful and prefer to leave it here which doesn't affect system runtime performance. Right?

          What happens if the user passes a relative path for MAPREDUCE_JOB_LOG4J_PROPERTIES_FILE? I think it works. If so, let's put this in the documentation too.

          Relative path will base on current user's working directory. Let's document it.

          Other documentation comments make sense to me. Given most of comments are documentation, let's resolve this and file a separated document JIRA to track.

          Show
          Junping Du added a comment - Thanks Vinod Kumar Vavilapalli for comments! Users can use URI fragments in distributed-cache - for e.g. hdfs://a/b/c#symlink. Either we should explicitly reject fragments or support them. See MRApps.parseDistributedCacheArtifacts() for similar changes that we do for regular dist-cache files. IMO, URI fragment doesn't make too much sense here as we are hiding details for how log4j.properties upload to hdfs and get download to each node's local cache and put it on classpath. MR over distributed cache support fragment as user need to explicitly set class path that included in tar ball and symlink for directory is easily to reference. In this case, user only need to set one value here to point to local filesystem. If you still think fragment is required, please file a separated JIRA instead of reopening this. If the user passes his own log4j file, setting YARN_APP_CONTAINER_LOG_SIZE, YARN_APP_CONTAINER_LOG_BACKUPS and logLevel may not matter at all. But let's set them as you do now, just leave a code comment that setting them may or may not really take effect. Agree we should put this on documentation. job.setWorkingDirectory() getting called repeatedly? As Zhijie's raised above, we decided to address the consolidation of some code flow here in MAPREDUCE-6148 . For the null-scheme input, "File " + file + " does not exist." -> "File " + file + " does not exist on the local file-system." Nice catch! Will address this minor issue in refactoring JIRA. Delete this log statement that you might have added for your testing: LOG.debug("default FileSystem: " + jtFs.getUri()); I think it is helpful and prefer to leave it here which doesn't affect system runtime performance. Right? What happens if the user passes a relative path for MAPREDUCE_JOB_LOG4J_PROPERTIES_FILE? I think it works. If so, let's put this in the documentation too. Relative path will base on current user's working directory. Let's document it. Other documentation comments make sense to me. Given most of comments are documentation, let's resolve this and file a separated document JIRA to track.
          Hide
          Junping Du added a comment -

          Filed MAPREDUCE-6149 for documentation.

          Show
          Junping Du added a comment - Filed MAPREDUCE-6149 for documentation.

            People

            • Assignee:
              Junping Du
              Reporter:
              Junping Du
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development