Details

    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Currently, when application is finished, NM will start to do the log aggregation. But for Long running service applications, this is not ideal. The problems we have are:
      1) LRS applications are expected to run for a long time (weeks, months).
      2) Currently, all the container logs (from one NM) will be written into a single file. The files could become larger and larger.

      1. YARN-2468.11.patch
        61 kB
        Xuan Gong
      2. YARN-2468.10.patch
        61 kB
        Xuan Gong
      3. YARN-2468.9.1.patch
        61 kB
        Xuan Gong
      4. YARN-2468.9.patch
        61 kB
        Xuan Gong
      5. YARN-2468.8.patch
        53 kB
        Xuan Gong
      6. YARN-2468.7.1.patch
        54 kB
        Xuan Gong
      7. YARN-2468.7.patch
        54 kB
        Xuan Gong
      8. YARN-2468.6.1.patch
        108 kB
        Xuan Gong
      9. YARN-2468.6.patch
        107 kB
        Xuan Gong
      10. YARN-2468.5.4.patch
        95 kB
        Xuan Gong
      11. YARN-2468.5.3.patch
        95 kB
        Xuan Gong
      12. YARN-2468.5.2.patch
        85 kB
        Xuan Gong
      13. YARN-2468.5.1.patch
        96 kB
        Xuan Gong
      14. YARN-2468.5.1.patch
        100 kB
        Xuan Gong
      15. YARN-2468.5.patch
        96 kB
        Xuan Gong
      16. YARN-2468.4.1.patch
        103 kB
        Xuan Gong
      17. YARN-2468.4.patch
        98 kB
        Xuan Gong
      18. YARN-2468.3.rebase.2.patch
        98 kB
        Xuan Gong
      19. YARN-2468.3.rebase.patch
        97 kB
        Xuan Gong
      20. YARN-2468.3.patch
        97 kB
        Xuan Gong
      21. YARN-2468.2.patch
        95 kB
        Xuan Gong
      22. YARN-2468.1.patch
        91 kB
        Xuan Gong

        Issue Links

          Activity

          Hide
          xgong Xuan Gong added a comment -

          We will rely on user’s log application (such as log4j) to do the rollover for the logs. We have already exposed an environment variable LOG_DIR that the users can use to set up their log application. We will provide a log service that will upload all the qualified logs periodically.

          Several changes in this patch:
          1. Create a logContext (add into ApplicationSubmissionContext), which includes include_patterns, exclude_patterns (Those are used to filter the logs) and intervals (defines how often log aggregation service uploads the container logs).
          2. AppLogAggregatorImpl will upload container logs periodically instead of waiting for the application to finish.
          3. Change the log layout. Currently, all container logs (for the same NM) will be written into a single file. This does not work for LRS. We will create a directory (named as node id of the NM), under this directory, every time when AppLogAggregatorImpl starts to upload container logs; it will create a file (named as node_id + timestamp).

          Show
          xgong Xuan Gong added a comment - We will rely on user’s log application (such as log4j) to do the rollover for the logs. We have already exposed an environment variable LOG_DIR that the users can use to set up their log application. We will provide a log service that will upload all the qualified logs periodically. Several changes in this patch: 1. Create a logContext (add into ApplicationSubmissionContext), which includes include_patterns, exclude_patterns (Those are used to filter the logs) and intervals (defines how often log aggregation service uploads the container logs). 2. AppLogAggregatorImpl will upload container logs periodically instead of waiting for the application to finish. 3. Change the log layout. Currently, all container logs (for the same NM) will be written into a single file. This does not work for LRS. We will create a directory (named as node id of the NM), under this directory, every time when AppLogAggregatorImpl starts to upload container logs; it will create a file (named as node_id + timestamp).
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12665186/YARN-2468.1.patch
          against trunk revision d8774cc.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to cause Findbugs (version 2.0.3) to fail.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4761//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4761//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12665186/YARN-2468.1.patch against trunk revision d8774cc. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to cause Findbugs (version 2.0.3) to fail. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in . +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4761//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4761//console This message is automatically generated.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          The overall proposal is fine.

          3. Change the log layout. Currently, all container logs (for the same NM) will be written into a single file. This does not work for LRS. We will create a directory (named as node id of the NM), under this directory, every time when AppLogAggregatorImpl starts to upload container logs; it will create a file (named as node_id + timestamp).

          We already have the too-many-files problem, this solution will make it worse. I'll give that this is a hard problem to solve - need to think about it a little more.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - The overall proposal is fine. 3. Change the log layout. Currently, all container logs (for the same NM) will be written into a single file. This does not work for LRS. We will create a directory (named as node id of the NM), under this directory, every time when AppLogAggregatorImpl starts to upload container logs; it will create a file (named as node_id + timestamp). We already have the too-many-files problem, this solution will make it worse. I'll give that this is a hard problem to solve - need to think about it a little more.
          Hide
          zjshen Zhijie Shen added a comment -

          I'm afraid it may be not fair to compare the log file creation between a single long running service and a short-term application.

          I'm thinking about the file problem in a different direction. Let's see how many log files will be created for a YARN cluster. For example, a long running service takes 10% resource from the cluster, and runs for 10 days. On each day, it will spawn out 1 log file per day. On the other side, for example, a normal application also takes 10% resource from the cluster, runs for 1 days, and spawn out 1 log file. Suppose the application will be started every day. Over 10 days, the number of spawned logs of both the long running service and the 10 iterations of the application is 10.

          So from the point of view of the cluster, the number of logs is proportional to the resource usage instead of the application number. The similar resource usage may result in the similar number of log files. The case may not becoming even worse if we take the whole cluster into account. However, I agree we loose the opportunity to even make a long running service to use a single log file, reducing the total log file number.

          To completely resolve the too-many-files problem, we make think of timeline server, which has the store layer to deal with the real I/O on your behalf. Another optimization may be log retention, I'm not sure the feature already exists or have been proposed together in this solution.

          Show
          zjshen Zhijie Shen added a comment - I'm afraid it may be not fair to compare the log file creation between a single long running service and a short-term application. I'm thinking about the file problem in a different direction. Let's see how many log files will be created for a YARN cluster. For example, a long running service takes 10% resource from the cluster, and runs for 10 days. On each day, it will spawn out 1 log file per day. On the other side, for example, a normal application also takes 10% resource from the cluster, runs for 1 days, and spawn out 1 log file. Suppose the application will be started every day. Over 10 days, the number of spawned logs of both the long running service and the 10 iterations of the application is 10. So from the point of view of the cluster, the number of logs is proportional to the resource usage instead of the application number. The similar resource usage may result in the similar number of log files. The case may not becoming even worse if we take the whole cluster into account. However, I agree we loose the opportunity to even make a long running service to use a single log file, reducing the total log file number. To completely resolve the too-many-files problem, we make think of timeline server, which has the store layer to deal with the real I/O on your behalf. Another optimization may be log retention, I'm not sure the feature already exists or have been proposed together in this solution.
          Hide
          xgong Xuan Gong added a comment -

          Did more investigations and offline discussions. It turns out this is a really hard problem. So, we decide to solve this step by step.

          For the first step, we will stick to the original proposal: change the log layout, create a directory (named as node id of the NM), under this directory, every time when AppLogAggregatorImpl starts to upload container logs; it will create a file (named as node_id + timestamp).
          This method will increase the number of log files, but it will work fine for a small cluster.

          For the next step, we need to find a better way to handle the logs more efficiently. We would like to aggregate all containers’ log (Those containers are belong to the same NM) in a single file. In that case, the total number of logs is bounded. But we need find more scalable way, other than TFile, to do it. Will open a separate ticket for this.

          Show
          xgong Xuan Gong added a comment - Did more investigations and offline discussions. It turns out this is a really hard problem. So, we decide to solve this step by step. For the first step, we will stick to the original proposal: change the log layout, create a directory (named as node id of the NM), under this directory, every time when AppLogAggregatorImpl starts to upload container logs; it will create a file (named as node_id + timestamp). This method will increase the number of log files, but it will work fine for a small cluster. For the next step, we need to find a better way to handle the logs more efficiently. We would like to aggregate all containers’ log (Those containers are belong to the same NM) in a single file. In that case, the total number of logs is bounded. But we need find more scalable way, other than TFile, to do it. Will open a separate ticket for this.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12668421/YARN-2468.2.patch
          against trunk revision 3122daa.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 javadoc. The javadoc tool appears to have generated 4 warning messages.
          See https://builds.apache.org/job/PreCommit-YARN-Build/4934//artifact/trunk/patchprocess/diffJavadocWarnings.txt for details.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4934//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/4934//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4934//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668421/YARN-2468.2.patch against trunk revision 3122daa. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 javadoc . The javadoc tool appears to have generated 4 warning messages. See https://builds.apache.org/job/PreCommit-YARN-Build/4934//artifact/trunk/patchprocess/diffJavadocWarnings.txt for details. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4934//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/4934//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4934//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12668465/YARN-2468.3.patch
          against trunk revision 54e5794.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          -1 javac. The applied patch generated 1265 javac compiler warnings (more than the trunk's current 1011 warnings).

          -1 javadoc. The javadoc tool appears to have generated 109 warning messages.
          See https://builds.apache.org/job/PreCommit-YARN-Build/4942//artifact/trunk/patchprocess/diffJavadocWarnings.txt for details.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4942//testReport/
          Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/4942//artifact/trunk/patchprocess/diffJavacWarnings.txt
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4942//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668465/YARN-2468.3.patch against trunk revision 54e5794. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. -1 javac . The applied patch generated 1265 javac compiler warnings (more than the trunk's current 1011 warnings). -1 javadoc . The javadoc tool appears to have generated 109 warning messages. See https://builds.apache.org/job/PreCommit-YARN-Build/4942//artifact/trunk/patchprocess/diffJavadocWarnings.txt for details. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4942//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/4942//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4942//console This message is automatically generated.
          Hide
          xgong Xuan Gong added a comment -

          create the patch based on the latest trunk

          Show
          xgong Xuan Gong added a comment - create the patch based on the latest trunk
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12668794/YARN-2468.3.rebase.patch
          against trunk revision 24d920b.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
          org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4959//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4959//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668794/YARN-2468.3.rebase.patch against trunk revision 24d920b. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4959//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4959//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12668986/YARN-2468.3.rebase.2.patch
          against trunk revision 7e08c0f.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
          org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4967//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4967//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668986/YARN-2468.3.rebase.2.patch against trunk revision 7e08c0f. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4967//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4967//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12669005/YARN-2468.4.patch
          against trunk revision 7e08c0f.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4968//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4968//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669005/YARN-2468.4.patch against trunk revision 7e08c0f. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4968//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4968//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12669227/YARN-2468.4.1.patch
          against trunk revision 3e85f5b.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.logaggregation.TestAggregatedLogsBlock
          org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4979//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4979//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669227/YARN-2468.4.1.patch against trunk revision 3e85f5b. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.logaggregation.TestAggregatedLogsBlock org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4979//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4979//console This message is automatically generated.
          Hide
          zjshen Zhijie Shen added a comment -

          Xuan Gong, thanks for the patch. I agree with the solution in general. Here're some comments so far about the patch. I may post more after a second scan.

          1. LogContext doesn't need to be in ApplicatonSubmissionContext, because ApplicatonSubmissionContext contains ContainerLaunchContext. LogContext is container related stuff, such that ContainerLaunchContext should be the best place. Concurrently, we can have one context for all containers. Maybe in the future we can think of setting different LogContext for each individual container.

          2. Shouldn't be "exclude" patterns?

          + *     <li>logExcludePattern. It defines include patterns which is used to
          + *     filter the log files. The log files which match the include
          + *     patterns will not be uploaded.</li>
          

          3. Is the "day" unit too coarse?

          + *     how often the logAggregationSerivce uploads container logs in days
          

          4. The message is a bit confusing. The application is not running or completed? Then what's the app state?

          +        System.out.println("Application is not running or completed."
          +            + " Logs are only available when an application is running"
          +            + " or completes")
          

          5. Why not logContext = p.getLogContext() directly?

          +    LogContext logContext = null;
          +    if (p.getLogContext() != null) {
          +      logContext = new LogContextPBImpl(p.getLogContext());
          +    }
          

          6. Should the method be public?

              public int numOflogFiles() {
          

          7. In getFilteredLogFiles, the logic is that if the log file matches the include pattern, it will be added first, and if then if it matches the exclude pattern, it will be removed. Shall we do the sanity check to make sure we can not include and exclude the same pattern, otherwise, the semantics is a bit weird.

          8. The following code doesn't need to be executed by the doas ugi?

                this.remoteAppLogFile = remoteAppLogFile;
                fc = FileContext.getFileContext(conf);
                fc.setUMask(APP_LOG_FILE_UMASK);
          

          9. Changes in dumpAllContainersLogs just are just reformatting? If so, shall we undo it?

          10. Is it okay that we have removed the following code to close the writer?

              if (this.writer != null) {
                this.writer.close();
                LOG.info("Finished aggregate log-file for app " + this.applicationId);
              }
          

          And I've two general questions:

          1. If LogContext is not specified, we're running into the traditional log handling case, right? We will still have a combined log file identified by the node id? Or node id will always be the directory, and there exists only one file under it?

          2. Let's say if work-preserving NM restarting happens, NM is going to forget all the uploaded logs files, and redo everything, right?

          Show
          zjshen Zhijie Shen added a comment - Xuan Gong , thanks for the patch. I agree with the solution in general. Here're some comments so far about the patch. I may post more after a second scan. 1. LogContext doesn't need to be in ApplicatonSubmissionContext, because ApplicatonSubmissionContext contains ContainerLaunchContext. LogContext is container related stuff, such that ContainerLaunchContext should be the best place. Concurrently, we can have one context for all containers. Maybe in the future we can think of setting different LogContext for each individual container. 2. Shouldn't be "exclude" patterns? + * <li>logExcludePattern. It defines include patterns which is used to + * filter the log files. The log files which match the include + * patterns will not be uploaded.</li> 3. Is the "day" unit too coarse? + * how often the logAggregationSerivce uploads container logs in days 4. The message is a bit confusing. The application is not running or completed? Then what's the app state? + System .out.println( "Application is not running or completed." + + " Logs are only available when an application is running" + + " or completes" ) 5. Why not logContext = p.getLogContext() directly? + LogContext logContext = null ; + if (p.getLogContext() != null ) { + logContext = new LogContextPBImpl(p.getLogContext()); + } 6. Should the method be public? public int numOflogFiles() { 7. In getFilteredLogFiles, the logic is that if the log file matches the include pattern, it will be added first, and if then if it matches the exclude pattern, it will be removed. Shall we do the sanity check to make sure we can not include and exclude the same pattern, otherwise, the semantics is a bit weird. 8. The following code doesn't need to be executed by the doas ugi? this .remoteAppLogFile = remoteAppLogFile; fc = FileContext.getFileContext(conf); fc.setUMask(APP_LOG_FILE_UMASK); 9. Changes in dumpAllContainersLogs just are just reformatting? If so, shall we undo it? 10. Is it okay that we have removed the following code to close the writer? if ( this .writer != null ) { this .writer.close(); LOG.info( "Finished aggregate log-file for app " + this .applicationId); } And I've two general questions: 1. If LogContext is not specified, we're running into the traditional log handling case, right? We will still have a combined log file identified by the node id? Or node id will always be the directory, and there exists only one file under it? 2. Let's say if work-preserving NM restarting happens, NM is going to forget all the uploaded logs files, and redo everything, right?
          Hide
          xgong Xuan Gong added a comment -

          If LogContext is not specified, we're running into the traditional log handling case, right? We will still have a combined log file identified by the node id? Or node id will always be the directory, and there exists only one file under it?

          node id will always be the directory, and there exists only one file under it

          Let's say if work-preserving NM restarting happens, NM is going to forget all the uploaded logs files, and redo everything, right?

          If NM restarts happens, it will upload all logs which are previous uploaded, but not deleted.
          I think that we can solve this problem in separate ticket, because this ticket is the first step to solve Log handling for LRS.

          LogContext doesn't need to be in ApplicatonSubmissionContext, because ApplicatonSubmissionContext contains ContainerLaunchContext. LogContext is container related stuff, such that ContainerLaunchContext should be the best place. Concurrently, we can have one context for all containers. Maybe in the future we can think of setting different LogContext for each individual container.

          DONE

          In getFilteredLogFiles, the logic is that if the log file matches the include pattern, it will be added first, and if then if it matches the exclude pattern, it will be removed. Shall we do the sanity check to make sure we can not include and exclude the same pattern, otherwise, the semantics is a bit weird.

          Add more explanation in javaDoc.

          Uploaded a new patch to address all comments.

          Show
          xgong Xuan Gong added a comment - If LogContext is not specified, we're running into the traditional log handling case, right? We will still have a combined log file identified by the node id? Or node id will always be the directory, and there exists only one file under it? node id will always be the directory, and there exists only one file under it Let's say if work-preserving NM restarting happens, NM is going to forget all the uploaded logs files, and redo everything, right? If NM restarts happens, it will upload all logs which are previous uploaded, but not deleted. I think that we can solve this problem in separate ticket, because this ticket is the first step to solve Log handling for LRS. LogContext doesn't need to be in ApplicatonSubmissionContext, because ApplicatonSubmissionContext contains ContainerLaunchContext. LogContext is container related stuff, such that ContainerLaunchContext should be the best place. Concurrently, we can have one context for all containers. Maybe in the future we can think of setting different LogContext for each individual container. DONE In getFilteredLogFiles, the logic is that if the log file matches the include pattern, it will be added first, and if then if it matches the exclude pattern, it will be removed. Shall we do the sanity check to make sure we can not include and exclude the same pattern, otherwise, the semantics is a bit weird. Add more explanation in javaDoc. Uploaded a new patch to address all comments.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12669460/YARN-2468.5.patch
          against trunk revision d9a8603.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 javadoc. The javadoc tool appears to have generated 3 warning messages.
          See https://builds.apache.org/job/PreCommit-YARN-Build/4995//artifact/trunk/patchprocess/diffJavadocWarnings.txt for details.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

          org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4995//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4995//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669460/YARN-2468.5.patch against trunk revision d9a8603. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 javadoc . The javadoc tool appears to have generated 3 warning messages. See https://builds.apache.org/job/PreCommit-YARN-Build/4995//artifact/trunk/patchprocess/diffJavadocWarnings.txt for details. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4995//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4995//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12669521/YARN-2468.5.1.patch
          against trunk revision f230248.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

          org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5007//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5007//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669521/YARN-2468.5.1.patch against trunk revision f230248. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5007//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5007//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12669602/YARN-2468.5.2.patch
          against trunk revision 47e5e19.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          -1 javac. The patch appears to cause the build to fail.

          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5017//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669602/YARN-2468.5.2.patch against trunk revision 47e5e19. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. -1 javac . The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5017//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12669602/YARN-2468.5.2.patch
          against trunk revision 47e5e19.

          -1 patch. Trunk compilation may be broken.

          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5018//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669602/YARN-2468.5.2.patch against trunk revision 47e5e19. -1 patch . Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5018//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12669611/YARN-2468.5.3.patch
          against trunk revision 47e5e19.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5019//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5019//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669611/YARN-2468.5.3.patch against trunk revision 47e5e19. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5019//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5019//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12669611/YARN-2468.5.3.patch
          against trunk revision 47e5e19.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

          org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5020//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5020//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669611/YARN-2468.5.3.patch against trunk revision 47e5e19. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5020//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5020//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12669626/YARN-2468.5.4.patch
          against trunk revision 10e8602.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5022//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5022//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669626/YARN-2468.5.4.patch against trunk revision 10e8602. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5022//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5022//console This message is automatically generated.
          Hide
          zjshen Zhijie Shen added a comment -

          1. Shall we say "if the log file name matches both the include and the exclude patterns, the file will be excluded eventually"?

          + *     filter the log files. The log files which match the include
          + *     patterns will not be uploaded. If any patterns are defined in
          + *     both logIncludePatterns and logExcludePatterns, those files matched
          + *     with these patterns will not be uploaded.</li>
          

          2. Except this method, other new APIs are marked @Stable. Is it a bit aggressive as it is not a simple new feature?

          +public abstract class LogContext {
          +
          +  @Public
          +  @Unstable
          +  public static LogContext newInstance(Set<String> logIncludePatterns,
          

          3. Is ACCEPTED or SUBMITTED to early to say the log is available? The AM container my not have been launched yet.

                 switch (appReport.getYarnApplicationState()) {
                 case NEW:
                 case NEW_SAVING:
          +        return -1;
                 case ACCEPTED:
                 case SUBMITTED:
                 case RUNNING:
          -        return -1;
          

          4. numOflogFiles -> getNumOfLogFilesToUpload?

          5. The following code always executes no matter the include patterns are applied or not. getFilteredLogFiles log seem to be improvable.

                Set<File> candiates =
                    new HashSet<File>(Arrays.asList(containerLogDir.listFiles()));
          

          6. Why not just tmp suffix?

            public static final String TMP_FILE = "TEMP.tmp";
          

          7. Should you call getRemoteNodeLogDirForApp here, though both methods seem to do the same thing?

          +    Path remoteAppNodeLogDir = LogAggregationUtils.getRemoteNodeLogFileForApp(
          

          8. In this case should we return -1?

              if (!foundContainerLogs) {
                System.out.println("Logs for container " + containerId
                    + " are not present in this log-file.");
              }
          

          9. Do you want to remove getRemoteNodeLogFileForApp (in LogAggregationUtils and LogAggregationService), which is not used by production code any more, and getRemoteNodeLogDirForApp is doing the identical stuff.

          10. Not right variable name convention.

                      Path node_appDir = LogAggregationUtils.getRemoteNodeLogDirForApp(
          
          +    long time_stamp = 0;
          
          +    while (count <= max_attempt) {
          

          11. Change appLogAggregators to public directly?

            @Private
            @VisibleForTesting
            public ConcurrentMap<ApplicationId, AppLogAggregator> getAppLogAggregators() {
              return this.appLogAggregators;
            }
          

          12. In testLogAggregationServiceWithInterval, round 1, 2, 3's logic seems to be similar. It is possible to have iteration of i = 1 -> 3?

          3. Is the "day" unit too coarse?

          If we want to use "day" as the unit, shall we allow "double" interval, such that I can say 0.1 day. I think it would be useful for testing purpose (not unit test, but verifying the feature is working on purpose, and the app is using the feature correctly). Thoughts?

          I think that we can solve this problem in separate ticket, because this ticket is the first step to solve Log handling for LRS.

          Sounds good. It shouldn't be a common case, unless we config to delay the log deletion for debugging purpose. Please file a Jira about it.

          Show
          zjshen Zhijie Shen added a comment - 1. Shall we say "if the log file name matches both the include and the exclude patterns, the file will be excluded eventually"? + * filter the log files. The log files which match the include + * patterns will not be uploaded. If any patterns are defined in + * both logIncludePatterns and logExcludePatterns, those files matched + * with these patterns will not be uploaded.</li> 2. Except this method, other new APIs are marked @Stable. Is it a bit aggressive as it is not a simple new feature? + public abstract class LogContext { + + @Public + @Unstable + public static LogContext newInstance(Set< String > logIncludePatterns, 3. Is ACCEPTED or SUBMITTED to early to say the log is available? The AM container my not have been launched yet. switch (appReport.getYarnApplicationState()) { case NEW: case NEW_SAVING: + return -1; case ACCEPTED: case SUBMITTED: case RUNNING: - return -1; 4. numOflogFiles -> getNumOfLogFilesToUpload? 5. The following code always executes no matter the include patterns are applied or not. getFilteredLogFiles log seem to be improvable. Set<File> candiates = new HashSet<File>(Arrays.asList(containerLogDir.listFiles())); 6. Why not just tmp suffix? public static final String TMP_FILE = "TEMP.tmp" ; 7. Should you call getRemoteNodeLogDirForApp here, though both methods seem to do the same thing? + Path remoteAppNodeLogDir = LogAggregationUtils.getRemoteNodeLogFileForApp( 8. In this case should we return -1? if (!foundContainerLogs) { System .out.println( "Logs for container " + containerId + " are not present in this log-file." ); } 9. Do you want to remove getRemoteNodeLogFileForApp (in LogAggregationUtils and LogAggregationService), which is not used by production code any more, and getRemoteNodeLogDirForApp is doing the identical stuff. 10. Not right variable name convention. Path node_appDir = LogAggregationUtils.getRemoteNodeLogDirForApp( + long time_stamp = 0; + while (count <= max_attempt) { 11. Change appLogAggregators to public directly? @Private @VisibleForTesting public ConcurrentMap<ApplicationId, AppLogAggregator> getAppLogAggregators() { return this .appLogAggregators; } 12. In testLogAggregationServiceWithInterval, round 1, 2, 3's logic seems to be similar. It is possible to have iteration of i = 1 -> 3? 3. Is the "day" unit too coarse? If we want to use "day" as the unit, shall we allow "double" interval, such that I can say 0.1 day. I think it would be useful for testing purpose (not unit test, but verifying the feature is working on purpose, and the app is using the feature correctly). Thoughts? I think that we can solve this problem in separate ticket, because this ticket is the first step to solve Log handling for LRS. Sounds good. It shouldn't be a common case, unless we config to delay the log deletion for debugging purpose. Please file a Jira about it.
          Hide
          xgong Xuan Gong added a comment -

          If we want to use "day" as the unit, shall we allow "double" interval, such that I can say 0.1 day. I think it would be useful for testing purpose (not unit test, but verifying the feature is working on purpose, and the app is using the feature correctly). Thoughts?

          Add extra YarnConfiguration.DEBUG_LOG_AGGREGATION_SPEED_UP_RATIO for test/debug purpose.

          New patch also address other comments

          Show
          xgong Xuan Gong added a comment - If we want to use "day" as the unit, shall we allow "double" interval, such that I can say 0.1 day. I think it would be useful for testing purpose (not unit test, but verifying the feature is working on purpose, and the app is using the feature correctly). Thoughts? Add extra YarnConfiguration.DEBUG_LOG_AGGREGATION_SPEED_UP_RATIO for test/debug purpose. New patch also address other comments
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12669802/YARN-2468.6.patch
          against trunk revision 474f116.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

          org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5029//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5029//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669802/YARN-2468.6.patch against trunk revision 474f116. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5029//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5029//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12669809/YARN-2468.6.1.patch
          against trunk revision a337f0e.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

          org.apache.hadoop.yarn.client.cli.TestLogsCLI
          org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5030//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5030//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669809/YARN-2468.6.1.patch against trunk revision a337f0e. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.client.cli.TestLogsCLI org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5030//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5030//console This message is automatically generated.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Started looking at it. A few major issues

          • Scale: W.r.t the number of files per app per node. But I see you split that into YARN-2548.
          • We CANNOT change the directory structure for all existing apps that don't specify a context! You essentially doubled the number of inodes for each application's logs!
          • The API (specifically the rolling-interval) on ContainerLaunchContext isn't really respected per container - it only works for the first container on any given node. I propose that we change this to be per-application. Doing it per-container in a scalable manner for every container is challenging.
          • We are missing webUI support, REST API support that can be done separately.
          • We should also have admin limits on minimum rolling interval, maximum retention time (or number of files per long running container) etc.

          IAC, I think we should split the API part into its own JIRA and start with it.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Started looking at it. A few major issues Scale: W.r.t the number of files per app per node. But I see you split that into YARN-2548 . We CANNOT change the directory structure for all existing apps that don't specify a context! You essentially doubled the number of inodes for each application's logs! The API (specifically the rolling-interval) on ContainerLaunchContext isn't really respected per container - it only works for the first container on any given node. I propose that we change this to be per-application. Doing it per-container in a scalable manner for every container is challenging. We are missing webUI support, REST API support that can be done separately. We should also have admin limits on minimum rolling interval, maximum retention time (or number of files per long running container) etc. IAC, I think we should split the API part into its own JIRA and start with it.
          Hide
          xgong Xuan Gong added a comment -

          This is a very big patch and is hard for review. I would like to decouple the big patch to several smaller patches:
          1) api changes will be tracked by YARN-2569
          2) NMs need to find a way to get LogAggregationContext. This will be tracked by YARN-2581
          3) Current ticket will be used to track changes for NM handling the logs for LRS which will include the log layout changes
          4) Log Deletion Service changes will be tracked by YARN-2583
          5) Related CL and web UI changes will be tracked by YARN-2582

          Show
          xgong Xuan Gong added a comment - This is a very big patch and is hard for review. I would like to decouple the big patch to several smaller patches: 1) api changes will be tracked by YARN-2569 2) NMs need to find a way to get LogAggregationContext. This will be tracked by YARN-2581 3) Current ticket will be used to track changes for NM handling the logs for LRS which will include the log layout changes 4) Log Deletion Service changes will be tracked by YARN-2583 5) Related CL and web UI changes will be tracked by YARN-2582
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12671109/YARN-2468.7.1.patch
          against trunk revision 72b0881.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5114//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5114//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12671109/YARN-2468.7.1.patch against trunk revision 72b0881. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5114//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5114//console This message is automatically generated.
          Hide
          zjshen Zhijie Shen added a comment -

          The patch is generally good. Some minor comments, and puzzles about the code.

          1. The first one is @VisibleForTesting? And the second one is not necessary?

          -  private static String getNodeString(NodeId nodeId) {
          +  public static String getNodeString(NodeId nodeId) {
               return nodeId.toString().replace(":", "_");
             }
          -  
          +
          +  public static String getNodeString(String nodeId) {
          +    return nodeId.replace(":", "_");
          +  }
          

          2. Add a TODO to say the test will be fixed in a in followup Jira, in case we forget it?

          +  @Ignore
             @Test
             public void testNoLogs() throws Exception {
          

          3. Based on my understanding, uploadedFiles is the candidate files to upload? If so, can we rename the variables and related methods?

          +    private Set<File> uploadedFiles = new HashSet<File>();
          

          4. I assume this var is going to capture all the existing log files on HDFS, isn't it? If so, the computation of it seems to be problematic, because it doesn't exclude the files to be excluded. And what's the effect on alreadyUploadedLogs?

          +    private Set<String> allExistingFileMeta = new HashSet<String>();
          
                Iterable<String> mask =
                    Iterables.filter(alreadyUploadedLogs, new Predicate<String>() {
                      @Override
                      public boolean apply(String next) {
                        return currentExistingLogFiles.contains(next);
                      }
                    });
          

          5. Make the old LogValue constructor based on the new one?

          6. LogValue.write is not necessary to be changed?

          7. It's recommended to close the Closable objects via IOUtils, but it seems that AggregatedLogFormat already has this issue before. Let's file a separate ticket for it.

          +        if (this.fsDataOStream != null) {
          +          this.fsDataOStream.close();
          +        }
          

          8. nodeId seems to be of no use. No need to be passed into AppLogAggregatorImpl.

          +  private final NodeId nodeId;
          

          9. remoteNodeLogDirForApp doesn't affect remoteNodeTmpLogFileForApp, which only depends on remoteNodeLogFileForApp. remoteNodeLogFileForApp is determined at construction, so remoteNodeTmpLogFileForApp should be final and computed once in constructor as well. And constructor param remoteNodeLogDirForApp should be renamed back to remoteNodeLogFileForApp.

          -  private final Path remoteNodeTmpLogFileForApp;
          +  private Path remoteNodeTmpLogFileForApp;
          
          -  private Path getRemoteNodeTmpLogFileForApp() {
          +  private Path getRemoteNodeTmpLogFileForApp(Path remoteNodeLogDirForApp) {
               return new Path(remoteNodeLogFileForApp.getParent(),
          -        (remoteNodeLogFileForApp.getName() + TMP_FILE_SUFFIX));
          +      (remoteNodeLogFileForApp.getName() + LogAggregationUtils.TMP_FILE_SUFFIX));
             }
          

          10. One typo

                // if any of the previous uoloaded logs have been deleted,
          

          11. One question: if one file is failed at uploading in LogValue.write(), uploadedFiles will not reflect the missing uploaded file, and it will not be uploaded again?

          Show
          zjshen Zhijie Shen added a comment - The patch is generally good. Some minor comments, and puzzles about the code. 1. The first one is @VisibleForTesting? And the second one is not necessary? - private static String getNodeString(NodeId nodeId) { + public static String getNodeString(NodeId nodeId) { return nodeId.toString().replace( ":" , "_" ); } - + + public static String getNodeString( String nodeId) { + return nodeId.replace( ":" , "_" ); + } 2. Add a TODO to say the test will be fixed in a in followup Jira, in case we forget it? + @Ignore @Test public void testNoLogs() throws Exception { 3. Based on my understanding, uploadedFiles is the candidate files to upload? If so, can we rename the variables and related methods? + private Set<File> uploadedFiles = new HashSet<File>(); 4. I assume this var is going to capture all the existing log files on HDFS, isn't it? If so, the computation of it seems to be problematic, because it doesn't exclude the files to be excluded. And what's the effect on alreadyUploadedLogs? + private Set< String > allExistingFileMeta = new HashSet< String >(); Iterable< String > mask = Iterables.filter(alreadyUploadedLogs, new Predicate< String >() { @Override public boolean apply( String next) { return currentExistingLogFiles.contains(next); } }); 5. Make the old LogValue constructor based on the new one? 6. LogValue.write is not necessary to be changed? 7. It's recommended to close the Closable objects via IOUtils, but it seems that AggregatedLogFormat already has this issue before. Let's file a separate ticket for it. + if ( this .fsDataOStream != null ) { + this .fsDataOStream.close(); + } 8. nodeId seems to be of no use. No need to be passed into AppLogAggregatorImpl. + private final NodeId nodeId; 9. remoteNodeLogDirForApp doesn't affect remoteNodeTmpLogFileForApp, which only depends on remoteNodeLogFileForApp. remoteNodeLogFileForApp is determined at construction, so remoteNodeTmpLogFileForApp should be final and computed once in constructor as well. And constructor param remoteNodeLogDirForApp should be renamed back to remoteNodeLogFileForApp. - private final Path remoteNodeTmpLogFileForApp; + private Path remoteNodeTmpLogFileForApp; - private Path getRemoteNodeTmpLogFileForApp() { + private Path getRemoteNodeTmpLogFileForApp(Path remoteNodeLogDirForApp) { return new Path(remoteNodeLogFileForApp.getParent(), - (remoteNodeLogFileForApp.getName() + TMP_FILE_SUFFIX)); + (remoteNodeLogFileForApp.getName() + LogAggregationUtils.TMP_FILE_SUFFIX)); } 10. One typo // if any of the previous uoloaded logs have been deleted, 11. One question: if one file is failed at uploading in LogValue.write(), uploadedFiles will not reflect the missing uploaded file, and it will not be uploaded again?
          Hide
          xgong Xuan Gong added a comment -

          One question: if one file is failed at uploading in LogValue.write(), uploadedFiles will not reflect the missing uploaded file, and it will not be uploaded again?

          Good catch. Fixed this in the new patch.

          Show
          xgong Xuan Gong added a comment - One question: if one file is failed at uploading in LogValue.write(), uploadedFiles will not reflect the missing uploaded file, and it will not be uploaded again? Good catch. Fixed this in the new patch.
          Hide
          zjshen Zhijie Shen added a comment -

          +1 for the latest patch

          Show
          zjshen Zhijie Shen added a comment - +1 for the latest patch
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12671569/YARN-2468.8.patch
          against trunk revision 6b7673e.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests:

          org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA
          org.apache.hadoop.yarn.client.TestResourceTrackerOnHA
          org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager
          org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater
          org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch

          The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests:

          org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart
          org.apache.hadoop.yarn.server.TestContainerManageTests

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5159//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5159//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12671569/YARN-2468.8.patch against trunk revision 6b7673e. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests: org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA org.apache.hadoop.yarn.client.TestResourceTrackerOnHA org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests: org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart org.apache.hadoop.yarn.server.TestContainerManageTests +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5159//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5159//console This message is automatically generated.
          Hide
          zjshen Zhijie Shen added a comment -

          The test failures seems to be related to addressing binding conflicts. Restart a build.

          Show
          zjshen Zhijie Shen added a comment - The test failures seems to be related to addressing binding conflicts. Restart a build.
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12671569/YARN-2468.8.patch
          against trunk revision 6b7673e.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5161//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5161//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12671569/YARN-2468.8.patch against trunk revision 6b7673e. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5161//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5161//console This message is automatically generated.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Started reviewing. The approach looks fine. Code comments:

          AggregatedLogFormat

          • Why is the following needed in AggregatedLogFormat, given we are opening the file in create+overwrite mode?
                            if (fc.util().exists(remoteAppLogFile)) {
                              fc.delete(remoteAppLogFile, false);
                            }
            
          • pendingUploadFiles is really not neded to be a class field. Rename getNumOfLogFilesToUpload() to be getPendingLogFilesToUploadForThisContainer() and return the set of pending files. LogValue.write() can then take Set<File> pendingLogFilesToUpload as one of the arguments.
          • getFilteredLogFiles() -> getPendingLogFilesToUpload(). Also, in this method, excludeFiles variable is not needed - similar to included-files, you can update the "candidates" collection itself to exlude files using the filter.
          • fileMeta() -> getLogFileMetaData()
          • alreadyUploadedLogs is passed down from the app to LogValue every time we do log-upload. This is a list of all files across all containers and not just for this container. All of this can be simplified by a ContainerLogAggregator abstraction.

          AppLogAggregatorImpl

          • If deletion of previously uploaded file takes a while and the file remains by the time of the next cycle, we will upload it again? It seems to be, let's validate this via a test-case.
          • The writer per app in each cycle should be closed before trying to rename the file.
          • The retention-policy stuff is not finished yet, but the upload during each cycle should call shouldUploadLogs()
          • I don't like that apps that don't need periodic rolling also have to track a lot of information like currentUploadedFiles. We should separate the code paths for simplicity, may be in another JIRA.
          • currentUploadedFiles > uploadedFilePathsInThisCycle, alreadyUploadedLogs -> uploadedFileMetaInThisCycle, currentExistingLogFiles> existingFileMetaInThisCycle

          Why is the test in TestAggregatedLogsBlock ignored?

          TestLogAggregationService:

          • testLogAggregationServiceWithPatterns() needs a test with container using both include and exclude patterns and test the order of the pattern based filtering
          • testLogAggregationServiceWithInterval: doLogAggregationOutOfBand + Thread.sleep() is unreliable. Use a clock and refactor AppLogAggregatorImpl to have the cyclic aggregation directly callable via a method.
          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Started reviewing. The approach looks fine. Code comments: AggregatedLogFormat Why is the following needed in AggregatedLogFormat, given we are opening the file in create+overwrite mode? if (fc.util().exists(remoteAppLogFile)) { fc.delete(remoteAppLogFile, false ); } pendingUploadFiles is really not neded to be a class field. Rename getNumOfLogFilesToUpload() to be getPendingLogFilesToUploadForThisContainer() and return the set of pending files. LogValue.write() can then take Set<File> pendingLogFilesToUpload as one of the arguments. getFilteredLogFiles() -> getPendingLogFilesToUpload(). Also, in this method, excludeFiles variable is not needed - similar to included-files, you can update the "candidates" collection itself to exlude files using the filter. fileMeta() -> getLogFileMetaData() alreadyUploadedLogs is passed down from the app to LogValue every time we do log-upload. This is a list of all files across all containers and not just for this container. All of this can be simplified by a ContainerLogAggregator abstraction. AppLogAggregatorImpl If deletion of previously uploaded file takes a while and the file remains by the time of the next cycle, we will upload it again? It seems to be, let's validate this via a test-case. The writer per app in each cycle should be closed before trying to rename the file. The retention-policy stuff is not finished yet, but the upload during each cycle should call shouldUploadLogs() I don't like that apps that don't need periodic rolling also have to track a lot of information like currentUploadedFiles. We should separate the code paths for simplicity, may be in another JIRA. currentUploadedFiles > uploadedFilePathsInThisCycle, alreadyUploadedLogs -> uploadedFileMetaInThisCycle, currentExistingLogFiles > existingFileMetaInThisCycle Why is the test in TestAggregatedLogsBlock ignored? TestLogAggregationService: testLogAggregationServiceWithPatterns() needs a test with container using both include and exclude patterns and test the order of the pattern based filtering testLogAggregationServiceWithInterval: doLogAggregationOutOfBand + Thread.sleep() is unreliable. Use a clock and refactor AppLogAggregatorImpl to have the cyclic aggregation directly callable via a method.
          Hide
          xgong Xuan Gong added a comment -

          Why is the test in TestAggregatedLogsBlock ignored?

          We will have YARN-2583 for web UI related changes. This test will be failed right now. So, I add @ignored

          pendingUploadFiles is really not neded to be a class field. Rename getNumOfLogFilesToUpload() to be getPendingLogFilesToUploadForThisContainer() and return the set of pending files. LogValue.write() can then take Set<File> pendingLogFilesToUpload as one of the arguments.

          I would like to check how many log files we can upload this time. If the number is 0, we can skip this time. And this check is also happened before LogKey.write(), otherwise, we will write key, but without value.

          If deletion of previously uploaded file takes a while and the file remains by the time of the next cycle, we will upload it again? It seems to be, let's validate this via a test-case.

          No, it will not. That is why I saved many information, such as allExistingFiles, alreadyUploadedFiles and etc. We will those to check whether the logs have been uploaded before.

          testLogAggregationServiceWithInterval: doLogAggregationOutOfBand + Thread.sleep() is unreliable. Use a clock and refactor AppLogAggregatorImpl to have the cyclic aggregation directly callable via a method.

          The Thread.sleep() is not used to trigger the logAggregation. It is used to make sure the logs has been uploaded into the remote directory. But, deleted those Thread.sleep() from the testcases.

          Show
          xgong Xuan Gong added a comment - Why is the test in TestAggregatedLogsBlock ignored? We will have YARN-2583 for web UI related changes. This test will be failed right now. So, I add @ignored pendingUploadFiles is really not neded to be a class field. Rename getNumOfLogFilesToUpload() to be getPendingLogFilesToUploadForThisContainer() and return the set of pending files. LogValue.write() can then take Set<File> pendingLogFilesToUpload as one of the arguments. I would like to check how many log files we can upload this time. If the number is 0, we can skip this time. And this check is also happened before LogKey.write(), otherwise, we will write key, but without value. If deletion of previously uploaded file takes a while and the file remains by the time of the next cycle, we will upload it again? It seems to be, let's validate this via a test-case. No, it will not. That is why I saved many information, such as allExistingFiles, alreadyUploadedFiles and etc. We will those to check whether the logs have been uploaded before. testLogAggregationServiceWithInterval: doLogAggregationOutOfBand + Thread.sleep() is unreliable. Use a clock and refactor AppLogAggregatorImpl to have the cyclic aggregation directly callable via a method. The Thread.sleep() is not used to trigger the logAggregation. It is used to make sure the logs has been uploaded into the remote directory. But, deleted those Thread.sleep() from the testcases.
          Hide
          xgong Xuan Gong added a comment -

          New patch addressed all other comments

          Show
          xgong Xuan Gong added a comment - New patch addressed all other comments
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12671923/YARN-2468.9.patch
          against trunk revision 0577eb3.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

          org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5178//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5178//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12671923/YARN-2468.9.patch against trunk revision 0577eb3. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5178//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5178//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12671951/YARN-2468.9.1.patch
          against trunk revision 0577eb3.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5181//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5181//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12671951/YARN-2468.9.1.patch against trunk revision 0577eb3. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5181//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5181//console This message is automatically generated.
          Hide
          zjshen Zhijie Shen added a comment -

          I would like to check how many log files we can upload this time. If the number is 0, we can skip this time. And this check is also happened before LogKey.write(), otherwise, we will write key, but without value.

          I think Vinod meant that pendingUploadFiles is needed, but doesn't need to the member variable. getPendingLogFilesToUploadForThisContainer can return this collection, and pass it into LogValue.write by adding one param of it.

          2. IMHO, the following code can be improved. If we use iterator, we can delete the unnecessary element on the fly.

                for (File file : candidates) {
                  Matcher fileMatcher = filterPattern.matcher(file.getName());
                  if (fileMatcher.find()) {
                    filteredFiles.add(file);
                  }
                }
                if (!exclusion) {
                  return filteredFiles;
                } else {
                  candidates.removeAll(filteredFiles);
                  return candidates;
                }
          

          This block could be:

          ...
          while(candidatesItr.hasNext()) {
            candidate = candidatesItr.next();
            ...
            if ((not match && inclusive) || (match && exclusive)) {
              candidatesItr.remove()
            } 
          }
          

          3. Jian He mentioned to me before that the following condition is not always true to determine an AM container. Any idea? And it seems that we don't need shouldUploadLogsForRunningContainer, we can re-use shouldUploadLogs and set wasContainerSuccessful to true. Personally, if it's not trivial to identify the AM container, I prefer to write a TODO comment and leave it until we implement the log retention API.

                if (containerId.getId() == 1) {
                  return true;
                }
          

          It seems to be, let's validate this via a test-case.

          Is it addressed by

              this.conf.setLong(YarnConfiguration.DEBUG_NM_DELETE_DELAY_SEC, 3600);
          

          Is it better to add a line of comment of the rationale behind the config?

          5. Can the following code

              Set<ContainerId> finishedContainers = new HashSet<ContainerId>();
              for (ContainerId id : pendingContainerInThisCycle) {
                finishedContainers.add(id);
              }
          

          be simplified as

           Set<ContainerId> finishedContainers = new HashSet<ContainerId>(pendingContainerInThisCycle);
          
          Show
          zjshen Zhijie Shen added a comment - I would like to check how many log files we can upload this time. If the number is 0, we can skip this time. And this check is also happened before LogKey.write(), otherwise, we will write key, but without value. I think Vinod meant that pendingUploadFiles is needed, but doesn't need to the member variable. getPendingLogFilesToUploadForThisContainer can return this collection, and pass it into LogValue.write by adding one param of it. 2. IMHO, the following code can be improved. If we use iterator, we can delete the unnecessary element on the fly. for (File file : candidates) { Matcher fileMatcher = filterPattern.matcher(file.getName()); if (fileMatcher.find()) { filteredFiles.add(file); } } if (!exclusion) { return filteredFiles; } else { candidates.removeAll(filteredFiles); return candidates; } This block could be: ... while (candidatesItr.hasNext()) { candidate = candidatesItr.next(); ... if ((not match && inclusive) || (match && exclusive)) { candidatesItr.remove() } } 3. Jian He mentioned to me before that the following condition is not always true to determine an AM container. Any idea? And it seems that we don't need shouldUploadLogsForRunningContainer, we can re-use shouldUploadLogs and set wasContainerSuccessful to true. Personally, if it's not trivial to identify the AM container, I prefer to write a TODO comment and leave it until we implement the log retention API. if (containerId.getId() == 1) { return true ; } It seems to be, let's validate this via a test-case. Is it addressed by this .conf.setLong(YarnConfiguration.DEBUG_NM_DELETE_DELAY_SEC, 3600); Is it better to add a line of comment of the rationale behind the config? 5. Can the following code Set<ContainerId> finishedContainers = new HashSet<ContainerId>(); for (ContainerId id : pendingContainerInThisCycle) { finishedContainers.add(id); } be simplified as Set<ContainerId> finishedContainers = new HashSet<ContainerId>(pendingContainerInThisCycle);
          Hide
          xgong Xuan Gong added a comment -

          new patch addressed all the comments

          Show
          xgong Xuan Gong added a comment - new patch addressed all the comments
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12672626/YARN-2468.10.patch
          against trunk revision a56f3ec.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

          org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5241//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5241//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672626/YARN-2468.10.patch against trunk revision a56f3ec. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5241//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5241//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12672626/YARN-2468.10.patch
          against trunk revision f679ca3.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5244//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5244//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672626/YARN-2468.10.patch against trunk revision f679ca3. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5244//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5244//console This message is automatically generated.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          This looks so much better, couple of minor nits

                  aggregator.doContainerLogAggregation(writer);
                  Set<Path> uploadedFilePathsInThisCycle =
                      aggregator.getUploadedFilePathInThisCycle();
          

          You can instead have the method doContainerLogAggregation() return the uploadedFilePathsInThisCycle and then remove

          • the method getUploadedFilePathInThisCycle()
          • and ContainerLogAggregator.uploadedFilePathInThisCycle field.

          Also ContainerLogAggregator.uploadedFileMeta is also not needed to be a class member.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - This looks so much better, couple of minor nits aggregator.doContainerLogAggregation(writer); Set<Path> uploadedFilePathsInThisCycle = aggregator.getUploadedFilePathInThisCycle(); You can instead have the method doContainerLogAggregation() return the uploadedFilePathsInThisCycle and then remove the method getUploadedFilePathInThisCycle() and ContainerLogAggregator.uploadedFilePathInThisCycle field. Also ContainerLogAggregator.uploadedFileMeta is also not needed to be a class member.
          Hide
          xgong Xuan Gong added a comment -

          Also ContainerLogAggregator.uploadedFileMeta is also not needed to be a class member.

          I think ContainerLogAggregator.uploadedFileMeta is needed to be a class member. It is used to keep track of all previous uploaded log files for each container. We will use this information to decide whether this log can be aggregated.

          New patch addressed other comments

          Show
          xgong Xuan Gong added a comment - Also ContainerLogAggregator.uploadedFileMeta is also not needed to be a class member. I think ContainerLogAggregator.uploadedFileMeta is needed to be a class member. It is used to keep track of all previous uploaded log files for each container. We will use this information to decide whether this log can be aggregated. New patch addressed other comments
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12672717/YARN-2468.11.patch
          against trunk revision 054f285.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5248//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5248//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672717/YARN-2468.11.patch against trunk revision 054f285. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5248//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5248//console This message is automatically generated.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Looks good, +1. Checking this in.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Looks good, +1. Checking this in.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Committed this to trunk, branch-2 and branch-2.6. Thanks Xuan!

          Tx for the reviews, Zhijie!

          I'll release note that this increases the number of log-files on HDFS to be greater than one for each long-running service that rolls logs. The current solution only works for a small number of long-running services, the scalability issue is tracked at YARN-2548.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Committed this to trunk, branch-2 and branch-2.6. Thanks Xuan! Tx for the reviews, Zhijie! I'll release note that this increases the number of log-files on HDFS to be greater than one for each long-running service that rolls logs. The current solution only works for a small number of long-running services, the scalability issue is tracked at YARN-2548 .
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #6187 (See https://builds.apache.org/job/Hadoop-trunk-Commit/6187/)
          YARN-2468. Enhanced NodeManager to support log handling APIs (YARN-2569) for use by long running services. Contributed by Xuan Gong. (vinodkv: rev 34cdcaad71cad76c0874a4e5266b4074009d2ffc)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogsBlock.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogAggregationUtils.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #6187 (See https://builds.apache.org/job/Hadoop-trunk-Commit/6187/ ) YARN-2468 . Enhanced NodeManager to support log handling APIs ( YARN-2569 ) for use by long running services. Contributed by Xuan Gong. (vinodkv: rev 34cdcaad71cad76c0874a4e5266b4074009d2ffc) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogsBlock.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogAggregationUtils.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk #700 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/700/)
          YARN-2468. Enhanced NodeManager to support log handling APIs (YARN-2569) for use by long running services. Contributed by Xuan Gong. (vinodkv: rev 34cdcaad71cad76c0874a4e5266b4074009d2ffc)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogsBlock.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogAggregationUtils.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #700 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/700/ ) YARN-2468 . Enhanced NodeManager to support log handling APIs ( YARN-2569 ) for use by long running services. Contributed by Xuan Gong. (vinodkv: rev 34cdcaad71cad76c0874a4e5266b4074009d2ffc) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogsBlock.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogAggregationUtils.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Hdfs-trunk #1891 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1891/)
          YARN-2468. Enhanced NodeManager to support log handling APIs (YARN-2569) for use by long running services. Contributed by Xuan Gong. (vinodkv: rev 34cdcaad71cad76c0874a4e5266b4074009d2ffc)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogAggregationUtils.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogsBlock.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk #1891 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1891/ ) YARN-2468 . Enhanced NodeManager to support log handling APIs ( YARN-2569 ) for use by long running services. Contributed by Xuan Gong. (vinodkv: rev 34cdcaad71cad76c0874a4e5266b4074009d2ffc) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogAggregationUtils.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogsBlock.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #1916 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1916/)
          YARN-2468. Enhanced NodeManager to support log handling APIs (YARN-2569) for use by long running services. Contributed by Xuan Gong. (vinodkv: rev 34cdcaad71cad76c0874a4e5266b4074009d2ffc)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogAggregationUtils.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogsBlock.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1916 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1916/ ) YARN-2468 . Enhanced NodeManager to support log handling APIs ( YARN-2569 ) for use by long running services. Contributed by Xuan Gong. (vinodkv: rev 34cdcaad71cad76c0874a4e5266b4074009d2ffc) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogAggregationUtils.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogsBlock.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java
          Hide
          rkannan82 Kannan Rajah added a comment -

          Xuan, I looked through the code changes and have a question about uploading logs for unfinished containers. Let's say we have already uploaded syslog for a container at time T1. At time T2, the container is still running and when the log aggregation is triggered again, will it re-upload the same syslog file? That seems to be the case.

          Show
          rkannan82 Kannan Rajah added a comment - Xuan, I looked through the code changes and have a question about uploading logs for unfinished containers. Let's say we have already uploaded syslog for a container at time T1. At time T2, the container is still running and when the log aggregation is triggered again, will it re-upload the same syslog file? That seems to be the case.
          Hide
          xgong Xuan Gong added a comment -

          Kannan Rajah

          Xuan, I looked through the code changes and have a question about uploading logs for unfinished containers. Let's say we have already uploaded syslog for a container at time T1. At time T2, the container is still running and when the log aggregation is triggered again, will it re-upload the same syslog file? That seems to be the case.

          It will not. EveryTime after we do the log aggregation, we will save the information for aggregated log file with (containerId.toString() + "" + file.getName() + ""+ file.lastModified()). So, in next run, before we start to upload logs, we will check the log file whether it exists in the savedAggregatedLogFileCache (uploadedFileMeta in AppLogAggregatorImpl), if it exists, we will skip. Otherwise, we will upload it.

          Show
          xgong Xuan Gong added a comment - Kannan Rajah Xuan, I looked through the code changes and have a question about uploading logs for unfinished containers. Let's say we have already uploaded syslog for a container at time T1. At time T2, the container is still running and when the log aggregation is triggered again, will it re-upload the same syslog file? That seems to be the case. It will not. EveryTime after we do the log aggregation, we will save the information for aggregated log file with (containerId.toString() + " " + file.getName() + " "+ file.lastModified()). So, in next run, before we start to upload logs, we will check the log file whether it exists in the savedAggregatedLogFileCache (uploadedFileMeta in AppLogAggregatorImpl), if it exists, we will skip. Otherwise, we will upload it.
          Hide
          rkannan82 Kannan Rajah added a comment -

          Thanks. But what about the case where the file was modified. Let's say 10 more lines were added to the syslog file. Doesn't it upload the full file again?

          Show
          rkannan82 Kannan Rajah added a comment - Thanks. But what about the case where the file was modified. Let's say 10 more lines were added to the syslog file. Doesn't it upload the full file again?
          Hide
          xgong Xuan Gong added a comment -

          Thanks. But what about the case where the file was modified. Let's say 10 more lines were added to the syslog file. Doesn't it upload the full file again?

          This is the pre-requirement: We will rely on user’s log application (such as log4j) to do the rollover for the logs. Users need to set up their log application properly. For our side, we upload every logs in our log dirs.

          Show
          xgong Xuan Gong added a comment - Thanks. But what about the case where the file was modified. Let's say 10 more lines were added to the syslog file. Doesn't it upload the full file again? This is the pre-requirement: We will rely on user’s log application (such as log4j) to do the rollover for the logs. Users need to set up their log application properly. For our side, we upload every logs in our log dirs.
          Hide
          rkannan82 Kannan Rajah added a comment -

          Makes sense. Sorry, but I have just one last question, not completely relevant to this JIRA though. Is there any ongoing effort to write the logs directly to HDFS instead of this 2 phase approach? If not, can you point out the reasons? This work being done to take care of the lifecycle of these logs seem fairly complex and also potentially adding performance overhead to the cluster. So I am interested to understand the rationale.

          Show
          rkannan82 Kannan Rajah added a comment - Makes sense. Sorry, but I have just one last question, not completely relevant to this JIRA though. Is there any ongoing effort to write the logs directly to HDFS instead of this 2 phase approach? If not, can you point out the reasons? This work being done to take care of the lifecycle of these logs seem fairly complex and also potentially adding performance overhead to the cluster. So I am interested to understand the rationale.
          Hide
          lars_francke Lars Francke added a comment -

          I know this is an old issue but I can't find any documentation on the behaviour. It'd be fabulous if someone could summarise what the status is and what's changed here and which parts of it are configurable.

          The very first post also mentions a LOG_DIR environment variable but I can't find any mention of it either.

          So what do applications need to do with their Log files to be handled properly?

          Show
          lars_francke Lars Francke added a comment - I know this is an old issue but I can't find any documentation on the behaviour. It'd be fabulous if someone could summarise what the status is and what's changed here and which parts of it are configurable. The very first post also mentions a LOG_DIR environment variable but I can't find any mention of it either. So what do applications need to do with their Log files to be handled properly?

            People

            • Assignee:
              xgong Xuan Gong
              Reporter:
              xgong Xuan Gong
            • Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development