Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4720

Skip unnecessary NN operations in log aggregation

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Log aggregation service could have unnecessary NN operations in the following scenarios:

      • No new local log has been created since the last upload for the long running service scenario.
      • NM uses ContainerLogAggregationPolicy that skips log aggregation for certain containers.

      In the following code snippet, even though pendingContainerInThisCycle is empty, it still creates the writer and then removes the file later. Thus it introduces unnecessary create/getfileinfo/delete NN calls when NM doesn't aggregate logs for an app.

      AppLogAggregatorImpl.java
      ......
              writer =
                  new LogWriter(this.conf, this.remoteNodeTmpLogFileForApp,
                      this.userUgi);
      ......
            for (ContainerId container : pendingContainerInThisCycle) {
      ......
            }
      ......
                  if (remoteFS.exists(remoteNodeTmpLogFileForApp)) {
                    if (rename) {
                      remoteFS.rename(remoteNodeTmpLogFileForApp, renamedPath);
                    } else {
                      remoteFS.delete(remoteNodeTmpLogFileForApp, false);
                    }
                  }
      ......
      
      1. YARN-4720.01.patch
        1 kB
        Jun Gong
      2. YARN-4720.02.patch
        6 kB
        Jun Gong
      3. YARN-4720.03.patch
        8 kB
        Jun Gong
      4. YARN-4720.04.patch
        9 kB
        Jun Gong
      5. YARN-4720.05.patch
        9 kB
        Jun Gong

        Activity

        Hide
        hex108 Jun Gong added a comment -

        Thanks Ming Ma for reporting. I just attached a patch for it.

        Show
        hex108 Jun Gong added a comment - Thanks Ming Ma for reporting. I just attached a patch for it.
        Hide
        mingma Ming Ma added a comment -

        Thanks Jun Gong for the patch. It addresses the first scenario of long running service where the function uploadLogsForContainers will be called when the app is running. Can we have the patch to support the second scenario of custom ContainerLogAggregationPolicy after the app has finished?

        New unit tests might be tricky, but it might be useful to have some kind of verification; for example to confirm unnecessary running LogAggregationReport is skipped.

        Show
        mingma Ming Ma added a comment - Thanks Jun Gong for the patch. It addresses the first scenario of long running service where the function uploadLogsForContainers will be called when the app is running. Can we have the patch to support the second scenario of custom ContainerLogAggregationPolicy after the app has finished? New unit tests might be tricky, but it might be useful to have some kind of verification; for example to confirm unnecessary running LogAggregationReport is skipped.
        Hide
        hex108 Jun Gong added a comment -

        Thanks Ming Ma for the review. Attach a new patch to address above problems.

        Show
        hex108 Jun Gong added a comment - Thanks Ming Ma for the review. Attach a new patch to address above problems.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 16s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 7m 2s trunk passed
        +1 compile 0m 26s trunk passed with JDK v1.8.0_72
        +1 compile 0m 29s trunk passed with JDK v1.7.0_95
        +1 checkstyle 0m 15s trunk passed
        +1 mvnsite 0m 28s trunk passed
        +1 mvneclipse 0m 13s trunk passed
        +1 findbugs 0m 49s trunk passed
        +1 javadoc 0m 18s trunk passed with JDK v1.8.0_72
        +1 javadoc 0m 22s trunk passed with JDK v1.7.0_95
        +1 mvninstall 0m 25s the patch passed
        +1 compile 0m 20s the patch passed with JDK v1.8.0_72
        +1 javac 0m 20s the patch passed
        +1 compile 0m 24s the patch passed with JDK v1.7.0_95
        +1 javac 0m 24s the patch passed
        -1 checkstyle 0m 13s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: patch generated 1 new + 17 unchanged - 1 fixed = 18 total (was 18)
        +1 mvnsite 0m 25s the patch passed
        +1 mvneclipse 0m 10s the patch passed
        +1 whitespace 0m 0s Patch has no whitespace issues.
        +1 findbugs 1m 0s the patch passed
        +1 javadoc 0m 15s the patch passed with JDK v1.8.0_72
        +1 javadoc 0m 20s the patch passed with JDK v1.7.0_95
        +1 unit 8m 55s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_72.
        +1 unit 9m 22s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_95.
        +1 asflicense 0m 18s Patch does not generate ASF License warnings.
        33m 44s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:0ca8df7
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12789595/YARN-4720.02.patch
        JIRA Issue YARN-4720
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux f0125aa6792a 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 3369a4f
        Default Java 1.7.0_95
        Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_72 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
        findbugs v3.0.0
        checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/10626/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
        JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10626/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/10626/console
        Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 16s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 2s trunk passed +1 compile 0m 26s trunk passed with JDK v1.8.0_72 +1 compile 0m 29s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 15s trunk passed +1 mvnsite 0m 28s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 0m 49s trunk passed +1 javadoc 0m 18s trunk passed with JDK v1.8.0_72 +1 javadoc 0m 22s trunk passed with JDK v1.7.0_95 +1 mvninstall 0m 25s the patch passed +1 compile 0m 20s the patch passed with JDK v1.8.0_72 +1 javac 0m 20s the patch passed +1 compile 0m 24s the patch passed with JDK v1.7.0_95 +1 javac 0m 24s the patch passed -1 checkstyle 0m 13s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: patch generated 1 new + 17 unchanged - 1 fixed = 18 total (was 18) +1 mvnsite 0m 25s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 1m 0s the patch passed +1 javadoc 0m 15s the patch passed with JDK v1.8.0_72 +1 javadoc 0m 20s the patch passed with JDK v1.7.0_95 +1 unit 8m 55s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_72. +1 unit 9m 22s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_95. +1 asflicense 0m 18s Patch does not generate ASF License warnings. 33m 44s Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12789595/YARN-4720.02.patch JIRA Issue YARN-4720 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux f0125aa6792a 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 3369a4f Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_72 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/10626/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10626/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/10626/console Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        mingma Ming Ma added a comment -

        Thanks Jun Gong for the update. The patch looks good overall. It does change the following behaviors.

        • When pendingContainerInThisCycle is empty, NM will skip sending the LogAggregationReport with LogAggregationStatus.RUNNING. It means for a long running service, it is possible for a yarn client to get LogAggregationStatus.NOT_START when it calls ApplicationClientProtocol#getApplicationReport if the long running service doesn't generate any log. Without the patch, NM will send LogAggregationStatus.RUNNING regardless. So it might be better to still send LogAggregationStatus.RUNNING regardless.
        • When LogWriter creation throws exception and appFinished is true, NM will send a LogAggregationReport with LogAggregationStatus.SUCCEEDED. Without the patch, NM won't send any final LogAggregationReport. Maybe it is better to update the patch to send LogAggregationStatus.FAILED for such scenario.
        Show
        mingma Ming Ma added a comment - Thanks Jun Gong for the update. The patch looks good overall. It does change the following behaviors. When pendingContainerInThisCycle is empty, NM will skip sending the LogAggregationReport with LogAggregationStatus.RUNNING . It means for a long running service, it is possible for a yarn client to get LogAggregationStatus.NOT_START when it calls ApplicationClientProtocol#getApplicationReport if the long running service doesn't generate any log. Without the patch, NM will send LogAggregationStatus.RUNNING regardless. So it might be better to still send LogAggregationStatus.RUNNING regardless. When LogWriter creation throws exception and appFinished is true, NM will send a LogAggregationReport with LogAggregationStatus.SUCCEEDED . Without the patch, NM won't send any final LogAggregationReport . Maybe it is better to update the patch to send LogAggregationStatus.FAILED for such scenario.
        Hide
        hex108 Jun Gong added a comment -

        Thanks Ming Ma for review and comments.

        When pendingContainerInThisCycle is empty, NM will skip sending the LogAggregationReport with LogAggregationStatus.RUNNING. It means for a long running service, it is possible for a yarn client to get LogAggregationStatus.NOT_START when it calls ApplicationClientProtocol#getApplicationReport if the long running service doesn't generate any log. Without the patch, NM will send LogAggregationStatus.RUNNING regardless. So it might be better to still send LogAggregationStatus.RUNNING regardless.

        Yes, it is a different behavior actually. LogAggregationReport is a report for current status, is it necessary to send a report if NM has not done log aggregation actually?

        BTW: I noticed that there is no cleanup for previous LogAggregationReport, there is only 'this.context.getLogAggregationStatusForApps().add()' and no 'remove', is it a deliberate action?

        When LogWriter creation throws exception and appFinished is true, NM will send a LogAggregationReport with LogAggregationStatus.SUCCEEDED. Without the patch, NM won't send any final LogAggregationReport. Maybe it is better to update the patch to send LogAggregationStatus.FAILED for such scenario.

        I will update the patch to address it.

        Show
        hex108 Jun Gong added a comment - Thanks Ming Ma for review and comments. When pendingContainerInThisCycle is empty, NM will skip sending the LogAggregationReport with LogAggregationStatus.RUNNING. It means for a long running service, it is possible for a yarn client to get LogAggregationStatus.NOT_START when it calls ApplicationClientProtocol#getApplicationReport if the long running service doesn't generate any log. Without the patch, NM will send LogAggregationStatus.RUNNING regardless. So it might be better to still send LogAggregationStatus.RUNNING regardless. Yes, it is a different behavior actually. LogAggregationReport is a report for current status, is it necessary to send a report if NM has not done log aggregation actually? BTW: I noticed that there is no cleanup for previous LogAggregationReport, there is only 'this.context.getLogAggregationStatusForApps().add()' and no 'remove', is it a deliberate action? When LogWriter creation throws exception and appFinished is true, NM will send a LogAggregationReport with LogAggregationStatus.SUCCEEDED. Without the patch, NM won't send any final LogAggregationReport. Maybe it is better to update the patch to send LogAggregationStatus.FAILED for such scenario. I will update the patch to address it.
        Hide
        mingma Ming Ma added a comment -

        It seems that LogAggregationStatus.RUNNING implies the log aggregation service is running, it doesn't necessarily mean NM actually aggregate any logs. So if the long running service is running and hasn't generate any logs since it starts, it is better to return LogAggregationStatus.RUNNING.

        Yes, NM can send several {{LogAggregationReport}}s in the list which is ordered; that is the API between NM and RM. Then on RM side, it will retrieve all elements from the list.

        Show
        mingma Ming Ma added a comment - It seems that LogAggregationStatus.RUNNING implies the log aggregation service is running, it doesn't necessarily mean NM actually aggregate any logs. So if the long running service is running and hasn't generate any logs since it starts, it is better to return LogAggregationStatus.RUNNING . Yes, NM can send several {{LogAggregationReport}}s in the list which is ordered; that is the API between NM and RM. Then on RM side, it will retrieve all elements from the list.
        Hide
        hex108 Jun Gong added a comment -

        Thanks for explaining. Attach a new patch to fix it.

        Yes, NM can send several {{LogAggregationReport}}s in the list which is ordered; that is the API between NM and RM. Then on RM side, it will retrieve all elements from the list.

        IIUC all LogAggregationReport(current and previous) are only added to 'context.getLogAggregationStatusForApps', and never removed.

        Show
        hex108 Jun Gong added a comment - Thanks for explaining. Attach a new patch to fix it. Yes, NM can send several {{LogAggregationReport}}s in the list which is ordered; that is the API between NM and RM. Then on RM side, it will retrieve all elements from the list. IIUC all LogAggregationReport(current and previous) are only added to 'context.getLogAggregationStatusForApps', and never removed.
        Hide
        mingma Ming Ma added a comment -

        ah, that is a good point. So for long running service, the LogAggregationReport list NM sends to RM will grow over time. Sounds like a bug; but not something related to this jira. Jun Gong, you want to open a separate jira for that?

        To have it send RUNNING report for all scenarios, how about moving the following block to finally?

              LogAggregationStatus logAggregationStatus =
                  logAggregationSucceedInThisCycle
                      ? LogAggregationStatus.RUNNING
                      : LogAggregationStatus.RUNNING_WITH_FAILURE;
              sendLogAggregationReport(logAggregationStatus, diagnosticMessage);
        

        Instead of creating a new operateWriterFailed, maybe it can reuse logAggregationSucceedInThisCycle instead.

        Show
        mingma Ming Ma added a comment - ah, that is a good point. So for long running service, the LogAggregationReport list NM sends to RM will grow over time. Sounds like a bug; but not something related to this jira. Jun Gong , you want to open a separate jira for that? To have it send RUNNING report for all scenarios, how about moving the following block to finally? LogAggregationStatus logAggregationStatus = logAggregationSucceedInThisCycle ? LogAggregationStatus.RUNNING : LogAggregationStatus.RUNNING_WITH_FAILURE; sendLogAggregationReport(logAggregationStatus, diagnosticMessage); Instead of creating a new operateWriterFailed , maybe it can reuse logAggregationSucceedInThisCycle instead.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 13s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 7m 29s trunk passed
        +1 compile 0m 25s trunk passed with JDK v1.8.0_72
        +1 compile 0m 27s trunk passed with JDK v1.7.0_95
        +1 checkstyle 0m 16s trunk passed
        +1 mvnsite 0m 29s trunk passed
        +1 mvneclipse 0m 13s trunk passed
        +1 findbugs 0m 55s trunk passed
        +1 javadoc 0m 20s trunk passed with JDK v1.8.0_72
        +1 javadoc 0m 22s trunk passed with JDK v1.7.0_95
        +1 mvninstall 0m 26s the patch passed
        +1 compile 0m 23s the patch passed with JDK v1.8.0_72
        +1 javac 0m 23s the patch passed
        +1 compile 0m 24s the patch passed with JDK v1.7.0_95
        +1 javac 0m 24s the patch passed
        -1 checkstyle 0m 14s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: patch generated 1 new + 17 unchanged - 1 fixed = 18 total (was 18)
        +1 mvnsite 0m 28s the patch passed
        +1 mvneclipse 0m 10s the patch passed
        +1 whitespace 0m 0s Patch has no whitespace issues.
        +1 findbugs 1m 4s the patch passed
        +1 javadoc 0m 16s the patch passed with JDK v1.8.0_72
        +1 javadoc 0m 20s the patch passed with JDK v1.7.0_95
        +1 unit 9m 8s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_72.
        +1 unit 9m 31s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_95.
        +1 asflicense 0m 18s Patch does not generate ASF License warnings.
        34m 51s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:0ca8df7
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12789856/YARN-4720.03.patch
        JIRA Issue YARN-4720
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 1e6841db56b4 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 6979cbf
        Default Java 1.7.0_95
        Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_72 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
        findbugs v3.0.0
        checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/10631/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
        JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10631/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/10631/console
        Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 13s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 29s trunk passed +1 compile 0m 25s trunk passed with JDK v1.8.0_72 +1 compile 0m 27s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 16s trunk passed +1 mvnsite 0m 29s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 0m 55s trunk passed +1 javadoc 0m 20s trunk passed with JDK v1.8.0_72 +1 javadoc 0m 22s trunk passed with JDK v1.7.0_95 +1 mvninstall 0m 26s the patch passed +1 compile 0m 23s the patch passed with JDK v1.8.0_72 +1 javac 0m 23s the patch passed +1 compile 0m 24s the patch passed with JDK v1.7.0_95 +1 javac 0m 24s the patch passed -1 checkstyle 0m 14s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: patch generated 1 new + 17 unchanged - 1 fixed = 18 total (was 18) +1 mvnsite 0m 28s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 1m 4s the patch passed +1 javadoc 0m 16s the patch passed with JDK v1.8.0_72 +1 javadoc 0m 20s the patch passed with JDK v1.7.0_95 +1 unit 9m 8s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_72. +1 unit 9m 31s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_95. +1 asflicense 0m 18s Patch does not generate ASF License warnings. 34m 51s Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12789856/YARN-4720.03.patch JIRA Issue YARN-4720 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 1e6841db56b4 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 6979cbf Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_72 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/10631/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10631/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/10631/console Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        hex108 Jun Gong added a comment -

        Thanks for the suggestion. Attach a new patch to address it.

        ah, that is a good point. So for long running service, the LogAggregationReport list NM sends to RM will grow over time. Sounds like a bug; but not something related to this jira. Jun Gong, you want to open a separate jira for that?

        Thanks for the confirmation. Just created for YARN-4735 to address it.

        Show
        hex108 Jun Gong added a comment - Thanks for the suggestion. Attach a new patch to address it. ah, that is a good point. So for long running service, the LogAggregationReport list NM sends to RM will grow over time. Sounds like a bug; but not something related to this jira. Jun Gong, you want to open a separate jira for that? Thanks for the confirmation. Just created for YARN-4735 to address it.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 10s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 6m 40s trunk passed
        +1 compile 0m 21s trunk passed with JDK v1.8.0_72
        +1 compile 0m 25s trunk passed with JDK v1.7.0_95
        +1 checkstyle 0m 15s trunk passed
        +1 mvnsite 0m 27s trunk passed
        +1 mvneclipse 0m 13s trunk passed
        +1 findbugs 0m 48s trunk passed
        +1 javadoc 0m 17s trunk passed with JDK v1.8.0_72
        +1 javadoc 0m 21s trunk passed with JDK v1.7.0_95
        +1 mvninstall 0m 23s the patch passed
        +1 compile 0m 20s the patch passed with JDK v1.8.0_72
        +1 javac 0m 20s the patch passed
        +1 compile 0m 23s the patch passed with JDK v1.7.0_95
        +1 javac 0m 23s the patch passed
        -1 checkstyle 0m 13s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: patch generated 1 new + 18 unchanged - 1 fixed = 19 total (was 19)
        +1 mvnsite 0m 25s the patch passed
        +1 mvneclipse 0m 11s the patch passed
        +1 whitespace 0m 0s Patch has no whitespace issues.
        +1 findbugs 0m 58s the patch passed
        +1 javadoc 0m 15s the patch passed with JDK v1.8.0_72
        +1 javadoc 0m 19s the patch passed with JDK v1.7.0_95
        +1 unit 8m 47s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_72.
        +1 unit 9m 19s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_95.
        +1 asflicense 0m 18s Patch does not generate ASF License warnings.
        32m 46s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:0ca8df7
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12789865/YARN-4720.04.patch
        JIRA Issue YARN-4720
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux c3b3cba6bf60 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 6979cbf
        Default Java 1.7.0_95
        Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_72 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
        findbugs v3.0.0
        checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/10633/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
        JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10633/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/10633/console
        Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 10s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 6m 40s trunk passed +1 compile 0m 21s trunk passed with JDK v1.8.0_72 +1 compile 0m 25s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 15s trunk passed +1 mvnsite 0m 27s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 0m 48s trunk passed +1 javadoc 0m 17s trunk passed with JDK v1.8.0_72 +1 javadoc 0m 21s trunk passed with JDK v1.7.0_95 +1 mvninstall 0m 23s the patch passed +1 compile 0m 20s the patch passed with JDK v1.8.0_72 +1 javac 0m 20s the patch passed +1 compile 0m 23s the patch passed with JDK v1.7.0_95 +1 javac 0m 23s the patch passed -1 checkstyle 0m 13s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: patch generated 1 new + 18 unchanged - 1 fixed = 19 total (was 19) +1 mvnsite 0m 25s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 0m 58s the patch passed +1 javadoc 0m 15s the patch passed with JDK v1.8.0_72 +1 javadoc 0m 19s the patch passed with JDK v1.7.0_95 +1 unit 8m 47s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_72. +1 unit 9m 19s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_95. +1 asflicense 0m 18s Patch does not generate ASF License warnings. 32m 46s Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12789865/YARN-4720.04.patch JIRA Issue YARN-4720 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux c3b3cba6bf60 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 6979cbf Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_72 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/10633/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10633/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/10633/console Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        mingma Ming Ma added a comment -

        +1 on the latest patch. I will wait until tomorrow to commit it in case others have more input.

        Show
        mingma Ming Ma added a comment - +1 on the latest patch. I will wait until tomorrow to commit it in case others have more input.
        Hide
        mingma Ming Ma added a comment -

        Even though all tests passed by jenkins, but if you run the new testSkipUnnecessaryNNOperations individually, it failed. Based on how the test is written, this.context.getLogAggregationStatusForApps().size() should be 4. The reason it passes if all tests run together could be due to the fact that second doLogAggregationOutOfBand() and the LogHandlerAppFinishedEvent right after that only cause one notification. After fixing the size check to 4, additional sleep after the second doLogAggregationOutOfBand() fixes the issue, or you can wait and verify getLogAggregationStatusForApps.size() after each doLogAggregationOutOfBand() call.

        In addition, it might be useful to add another test case without long running service but with FailedOrKilledContainerLogAggregationPolicy; For that case, no log aggregation should happen. That will help to verify the other scenario. What do you think, Jun Gong?

        Show
        mingma Ming Ma added a comment - Even though all tests passed by jenkins, but if you run the new testSkipUnnecessaryNNOperations individually, it failed. Based on how the test is written, this.context.getLogAggregationStatusForApps().size() should be 4. The reason it passes if all tests run together could be due to the fact that second doLogAggregationOutOfBand() and the LogHandlerAppFinishedEvent right after that only cause one notification. After fixing the size check to 4, additional sleep after the second doLogAggregationOutOfBand() fixes the issue, or you can wait and verify getLogAggregationStatusForApps.size() after each doLogAggregationOutOfBand() call. In addition, it might be useful to add another test case without long running service but with FailedOrKilledContainerLogAggregationPolicy; For that case, no log aggregation should happen. That will help to verify the other scenario. What do you think, Jun Gong ?
        Hide
        hex108 Jun Gong added a comment -

        Thanks Ming Ma for suggestions. I attached a wrong version patch(YARN-4720.04.patch)...

        The new patch fixed the problem and added a new test.

        Show
        hex108 Jun Gong added a comment - Thanks Ming Ma for suggestions. I attached a wrong version patch( YARN-4720 .04.patch)... The new patch fixed the problem and added a new test.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 10s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 6m 28s trunk passed
        +1 compile 0m 23s trunk passed with JDK v1.8.0_72
        +1 compile 0m 26s trunk passed with JDK v1.7.0_95
        +1 checkstyle 0m 15s trunk passed
        +1 mvnsite 0m 28s trunk passed
        +1 mvneclipse 0m 13s trunk passed
        +1 findbugs 0m 51s trunk passed
        +1 javadoc 0m 18s trunk passed with JDK v1.8.0_72
        +1 javadoc 0m 21s trunk passed with JDK v1.7.0_95
        +1 mvninstall 0m 23s the patch passed
        +1 compile 0m 19s the patch passed with JDK v1.8.0_72
        +1 javac 0m 19s the patch passed
        +1 compile 0m 22s the patch passed with JDK v1.7.0_95
        +1 javac 0m 22s the patch passed
        -1 checkstyle 0m 13s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: patch generated 1 new + 18 unchanged - 1 fixed = 19 total (was 19)
        +1 mvnsite 0m 26s the patch passed
        +1 mvneclipse 0m 11s the patch passed
        +1 whitespace 0m 0s Patch has no whitespace issues.
        +1 findbugs 1m 0s the patch passed
        +1 javadoc 0m 15s the patch passed with JDK v1.8.0_72
        +1 javadoc 0m 18s the patch passed with JDK v1.7.0_95
        +1 unit 9m 2s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_72.
        +1 unit 9m 30s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_95.
        +1 asflicense 0m 17s Patch does not generate ASF License warnings.
        33m 8s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:0ca8df7
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12790051/YARN-4720.05.patch
        JIRA Issue YARN-4720
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux b196f627fb17 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / d7fdec1
        Default Java 1.7.0_95
        Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_72 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
        findbugs v3.0.0
        checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/10645/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
        JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10645/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/10645/console
        Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 10s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 6m 28s trunk passed +1 compile 0m 23s trunk passed with JDK v1.8.0_72 +1 compile 0m 26s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 15s trunk passed +1 mvnsite 0m 28s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 0m 51s trunk passed +1 javadoc 0m 18s trunk passed with JDK v1.8.0_72 +1 javadoc 0m 21s trunk passed with JDK v1.7.0_95 +1 mvninstall 0m 23s the patch passed +1 compile 0m 19s the patch passed with JDK v1.8.0_72 +1 javac 0m 19s the patch passed +1 compile 0m 22s the patch passed with JDK v1.7.0_95 +1 javac 0m 22s the patch passed -1 checkstyle 0m 13s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: patch generated 1 new + 18 unchanged - 1 fixed = 19 total (was 19) +1 mvnsite 0m 26s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 1m 0s the patch passed +1 javadoc 0m 15s the patch passed with JDK v1.8.0_72 +1 javadoc 0m 18s the patch passed with JDK v1.7.0_95 +1 unit 9m 2s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_72. +1 unit 9m 30s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_95. +1 asflicense 0m 17s Patch does not generate ASF License warnings. 33m 8s Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12790051/YARN-4720.05.patch JIRA Issue YARN-4720 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux b196f627fb17 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / d7fdec1 Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_72 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/10645/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10645/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/10645/console Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-trunk-Commit #9374 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9374/)
        YARN-4720. Skip unnecessary NN operations in log aggregation. (Jun Gong (mingma: rev 7f3139e54da2c496327446a5eac43f8421fc8839)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
        • hadoop-yarn-project/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #9374 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9374/ ) YARN-4720 . Skip unnecessary NN operations in log aggregation. (Jun Gong (mingma: rev 7f3139e54da2c496327446a5eac43f8421fc8839) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java hadoop-yarn-project/CHANGES.txt
        Hide
        mingma Ming Ma added a comment -

        Committed to branch-2.8, branch-2 and trunk. Given the patch is based on YARN-1376 and YARN-221, it will require extra effort to backport it to branch-2.7 and branch-2.6. Thanks Jun Gong for the contribution!

        Show
        mingma Ming Ma added a comment - Committed to branch-2.8, branch-2 and trunk. Given the patch is based on YARN-1376 and YARN-221 , it will require extra effort to backport it to branch-2.7 and branch-2.6. Thanks Jun Gong for the contribution!
        Hide
        hex108 Jun Gong added a comment -

        Thanks Ming Ma for the review, suggestion and commit!

        Show
        hex108 Jun Gong added a comment - Thanks Ming Ma for the review, suggestion and commit!

          People

          • Assignee:
            hex108 Jun Gong
            Reporter:
            mingma Ming Ma
          • Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development