Hadoop YARN
  1. Hadoop YARN
  2. YARN-415

Capture aggregate memory allocation at the app-level for chargeback

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.5.0
    • Fix Version/s: 2.6.0
    • Component/s: resourcemanager
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      For the purpose of chargeback, I'd like to be able to compute the cost of an
      application in terms of cluster resource usage. To start out, I'd like to get the memory utilization of an application. The unit should be MB-seconds or something similar and, from a chargeback perspective, the memory amount should be the memory reserved for the application, as even if the app didn't use all that memory, no one else was able to use it.

      (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
      container 2 * lifetime of container 2) + ... + (reserved ram for container n * lifetime of container n)

      It'd be nice to have this at the app level instead of the job level because:
      1. We'd still be able to get memory usage for jobs that crashed (and wouldn't appear on the job history server).
      2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).

      This new metric should be available both through the RM UI and RM Web Services REST API.

      1. YARN-415.201409102216.txt
        121 kB
        Eric Payne
      2. YARN-415.201409092204.txt
        120 kB
        Eric Payne
      3. YARN-415.201409040036.txt
        120 kB
        Eric Payne
      4. YARN-415.201408212033.txt
        118 kB
        Eric Payne
      5. YARN-415.201408181938.txt
        119 kB
        Eric Payne
      6. YARN-415.201408181938.txt
        119 kB
        Eric Payne
      7. YARN-415.201408150030.txt
        119 kB
        Eric Payne
      8. YARN-415.201408132109.txt
        120 kB
        Eric Payne
      9. YARN-415.201408092006.txt
        116 kB
        Eric Payne
      10. YARN-415.201408080204.txt
        121 kB
        Eric Payne
      11. YARN-415.201408062232.txt
        113 kB
        Eric Payne
      12. YARN-415.201407281816.txt
        99 kB
        Eric Payne
      13. YARN-415.201407242148.txt
        79 kB
        Eric Payne
      14. YARN-415.201407232237.txt
        78 kB
        Eric Payne
      15. YARN-415.201407172144.txt
        71 kB
        Eric Payne
      16. YARN-415.201407171553.txt
        85 kB
        Eric Payne
      17. YARN-415.201407071542.txt
        83 kB
        Eric Payne
      18. YARN-415.201407042037.txt
        83 kB
        Eric Payne
      19. YARN-415.201406262136.txt
        81 kB
        Eric Payne
      20. YARN-415.201406031616.txt
        83 kB
        Eric Payne
      21. YARN-415.201405311749.txt
        83 kB
        Eric Payne
      22. YARN-415--n10.patch
        103 kB
        Andrey Klochkov
      23. YARN-415--n9.patch
        59 kB
        Andrey Klochkov
      24. YARN-415--n8.patch
        63 kB
        Andrey Klochkov
      25. YARN-415--n7.patch
        59 kB
        Andrey Klochkov
      26. YARN-415--n6.patch
        59 kB
        Andrey Klochkov
      27. YARN-415--n5.patch
        55 kB
        Andrey Klochkov
      28. YARN-415--n4.patch
        55 kB
        Andrey Klochkov
      29. YARN-415--n3.patch
        60 kB
        Andrey Klochkov
      30. YARN-415--n2.patch
        57 kB
        Andrey Klochkov
      31. YARN-415.patch
        57 kB
        Andrey Klochkov

        Issue Links

          Activity

          Hide
          Arun C Murthy added a comment -

          +1 for the thought!

          Show
          Arun C Murthy added a comment - +1 for the thought!
          Hide
          Andy Rhee added a comment -

          +1 This feature is way overdue!

          Show
          Andy Rhee added a comment - +1 This feature is way overdue!
          Hide
          Andrey Klochkov added a comment -

          The patch exposes MB-seconds and CPU-seconds through CLI, REST API and UI.

          Show
          Andrey Klochkov added a comment - The patch exposes MB-seconds and CPU-seconds through CLI, REST API and UI.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12604125/YARN-415.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebApp

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1973//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1973//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1973//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604125/YARN-415.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebApp +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1973//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1973//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1973//console This message is automatically generated.
          Hide
          Andrey Klochkov added a comment -

          Updating the patch with fixes of findbugs warnings on multithreaded correctness

          Show
          Andrey Klochkov added a comment - Updating the patch with fixes of findbugs warnings on multithreaded correctness
          Hide
          Andrey Klochkov added a comment -

          Updating the patch with fixes in tests.

          Show
          Andrey Klochkov added a comment - Updating the patch with fixes in tests.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12604132/YARN-415--n2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebApp

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1975//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1975//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604132/YARN-415--n2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebApp +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1975//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1975//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12604142/YARN-415--n3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1977//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1977//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604142/YARN-415--n3.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1977//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1977//console This message is automatically generated.
          Hide
          Omkar Vinit Joshi added a comment -

          I am not sure if it affects this or not but today say if client asks for 2000 we round it off.. and this patch will report it based on round off value..Is it right?
          Also few questions about the scenarios

          • What if container is allocated by RM and before AM launches it NM reboots (invaliding the earlier container... so should the user be charged?) what is AM reports back saying
            • It doesn't want the container - I think if should still be charged as it earlier asked for one... but I remember there being an issue where RM allocated one more or one less than what user requested (obviously RM scheduler issue..)
            • It doesn't want the container may be because it is not able to launch it on NM for whatever reason.
          • for short running containers there is a gap between container actually finishes and is reported back to RM are we accounting for this? Similarly AM receiving the container and RM either allocating or processing ACQUIRED event.
          • Also today container finish time is computed on RM side. I think it should be done on NM side and communicated back to RM (implicit assumption both servers are in sync).
          Show
          Omkar Vinit Joshi added a comment - I am not sure if it affects this or not but today say if client asks for 2000 we round it off.. and this patch will report it based on round off value..Is it right? Also few questions about the scenarios What if container is allocated by RM and before AM launches it NM reboots (invaliding the earlier container... so should the user be charged?) what is AM reports back saying It doesn't want the container - I think if should still be charged as it earlier asked for one... but I remember there being an issue where RM allocated one more or one less than what user requested (obviously RM scheduler issue..) It doesn't want the container may be because it is not able to launch it on NM for whatever reason. for short running containers there is a gap between container actually finishes and is reported back to RM are we accounting for this? Similarly AM receiving the container and RM either allocating or processing ACQUIRED event. Also today container finish time is computed on RM side. I think it should be done on NM side and communicated back to RM (implicit assumption both servers are in sync).
          Hide
          Andrey Klochkov added a comment -

          The proposed implementation uses events fired by the scheduler to track resources usage, so we start ticking as soon as a container is allocated by the scheduler and stop doing that when the container is completed and the scheduler "gets" the resources back. Hence, in case a container fails to start by some reason, we'll stop ticking as soon as RM will get this reported. As for the gap between a container actually finishes and RM gets the report, we don't manage it, i.e. the client will be charged until RM gets the report. Start time and finish time are both computed by the scheduler, i.e. it's on the RM side.

          Not sure about rounding off - can you point me to the code which does that? I think we just use what's provided in the ApplicationSubmissionContext, i.e. it shouldn't be rounded off.

          Show
          Andrey Klochkov added a comment - The proposed implementation uses events fired by the scheduler to track resources usage, so we start ticking as soon as a container is allocated by the scheduler and stop doing that when the container is completed and the scheduler "gets" the resources back. Hence, in case a container fails to start by some reason, we'll stop ticking as soon as RM will get this reported. As for the gap between a container actually finishes and RM gets the report, we don't manage it, i.e. the client will be charged until RM gets the report. Start time and finish time are both computed by the scheduler, i.e. it's on the RM side. Not sure about rounding off - can you point me to the code which does that? I think we just use what's provided in the ApplicationSubmissionContext, i.e. it shouldn't be rounded off.
          Hide
          Jason Lowe added a comment -

          I think the accounting should be based on the RM's view of the cluster state, since it is controlling which applications get the resources. If the RM allocates a container for an application, whether the application ultimately uses it or not, it is a resource that is unavailable for other applications to use. Similarly if an RM allocates a container but it takes a while for the AM to heartbeat in to receive it, takes even longer for the AM to connect to the NM to launch it, and takes even longer for the NM to report back that the container completed, all of that is still time where the container's resources were unavailable to other applications. The RM was setting aside those resources on behalf of that application, therefore the application should be accountable.

          Bugs in the RM or scheduler may cause extraneous containers to be allocated, but they are still allocated on behalf of a particular application and should be charged accordingly.

          Show
          Jason Lowe added a comment - I think the accounting should be based on the RM's view of the cluster state, since it is controlling which applications get the resources. If the RM allocates a container for an application, whether the application ultimately uses it or not, it is a resource that is unavailable for other applications to use. Similarly if an RM allocates a container but it takes a while for the AM to heartbeat in to receive it, takes even longer for the AM to connect to the NM to launch it, and takes even longer for the NM to report back that the container completed, all of that is still time where the container's resources were unavailable to other applications. The RM was setting aside those resources on behalf of that application, therefore the application should be accountable. Bugs in the RM or scheduler may cause extraneous containers to be allocated, but they are still allocated on behalf of a particular application and should be charged accordingly.
          Hide
          Jason Lowe added a comment -

          The latest patch no longer applies to trunk. Could you please refresh it? Some review comments:

          General:

          • Nit: the extra plurality of VirtualCoresSeconds sounds a bit odd, wondering if it should be VirtualCoreSeconds or VcoreSeconds in the various places it appears.

          ApplicationCLI:

          • UI wording: In the code it's vcore-seconds but the UI says CPU-seconds. I'm wondering if users are going to interpret CPU to be a hardware core, and I'm not sure a vcore will map to a hardware core in the typical case. The configuration properties refer to vcores, so we should probably use vcore-seconds here for consistency. Curious what others think about this, as I could be convinced to leave it as CPU.

          RMAppAttempt has just a spurious whitespace change

          RMAppAttemptImpl:

          • Nit: containerAllocated and containerFinished are private and always called from transitions, so acquiring the write lock is unnecessary.
          • ContainerFinishedTransition.transition does not call containerFinished when it's the AM container. We "leak" the AM container and consider it always running if an AM crashes.

          RMContainerEvent:

          • Nit: whitespace between the constructor definitions would be nice.

          TestRMAppAttemptTransitions:

          • Nit: it would be cleaner and easier to read if we add a new allocateApplicationAttemptAtTime method and have the existing allocateApplicationAttempt method simply call it with -1 rather than change all those places to pass -1.

          Speaking of "leaking" containers, is there something we can do to audit/assert that applications that have completed don't have running containers? If we lose track of a container finished event, the consumed resources are going to keep increasing indefinitely. It's a bug in the RM either way but wondering if there's some warning/sanity checking we can do to keep the metric from becoming utterly useless when it occurs. Capping it at the end of the application would at least prevent it from growing beyond the application lifetime. Then again, letting it grow continuously at least is more indicative something went terribly wrong with the accounting and therefore the metric can't be trusted. Just thinking out loud, not sure what the best solution is.

          Show
          Jason Lowe added a comment - The latest patch no longer applies to trunk. Could you please refresh it? Some review comments: General: Nit: the extra plurality of VirtualCoresSeconds sounds a bit odd, wondering if it should be VirtualCoreSeconds or VcoreSeconds in the various places it appears. ApplicationCLI: UI wording: In the code it's vcore-seconds but the UI says CPU-seconds. I'm wondering if users are going to interpret CPU to be a hardware core, and I'm not sure a vcore will map to a hardware core in the typical case. The configuration properties refer to vcores, so we should probably use vcore-seconds here for consistency. Curious what others think about this, as I could be convinced to leave it as CPU. RMAppAttempt has just a spurious whitespace change RMAppAttemptImpl: Nit: containerAllocated and containerFinished are private and always called from transitions, so acquiring the write lock is unnecessary. ContainerFinishedTransition.transition does not call containerFinished when it's the AM container. We "leak" the AM container and consider it always running if an AM crashes. RMContainerEvent: Nit: whitespace between the constructor definitions would be nice. TestRMAppAttemptTransitions: Nit: it would be cleaner and easier to read if we add a new allocateApplicationAttemptAtTime method and have the existing allocateApplicationAttempt method simply call it with -1 rather than change all those places to pass -1. Speaking of "leaking" containers, is there something we can do to audit/assert that applications that have completed don't have running containers? If we lose track of a container finished event, the consumed resources are going to keep increasing indefinitely. It's a bug in the RM either way but wondering if there's some warning/sanity checking we can do to keep the metric from becoming utterly useless when it occurs. Capping it at the end of the application would at least prevent it from growing beyond the application lifetime. Then again, letting it grow continuously at least is more indicative something went terribly wrong with the accounting and therefore the metric can't be trusted. Just thinking out loud, not sure what the best solution is.
          Hide
          Andrey Klochkov added a comment -

          Jason, thanks for the thorough review. Attaching the patch with fixes. I basically made all the fixes you're proposing except the last one about capturing the leak.

          Show
          Andrey Klochkov added a comment - Jason, thanks for the thorough review. Attaching the patch with fixes. I basically made all the fixes you're proposing except the last one about capturing the leak.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12606923/YARN-415--n4.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2140//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2140//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606923/YARN-415--n4.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 7 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2140//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2140//console This message is automatically generated.
          Hide
          Andrey Klochkov added a comment -

          Fixing the failed test.

          Show
          Andrey Klochkov added a comment - Fixing the failed test.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12607275/YARN-415--n5.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2142//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2142//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12607275/YARN-415--n5.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 7 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2142//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2142//console This message is automatically generated.
          Hide
          Jason Lowe added a comment -

          Thanks for updating the patch , Andrey. The changes look good, and I think we're close. However I spent a bit more time thinking about the leak case, and there's at least one more place where we can leak the bookkeeping for containers in RMAppAttemptImpl.

          If an AM crashes while tasks are running then we'll get a CONTAINER_FINISHED event, clean up the AM container and move the app attempt to the FAILED state. However in the FAILED state we ignore any container events, and subsequent container events for tasks that are killed off after the AM attempt dies will be ignored. That means we leak the bookkeeping for some containers and the resource utilization of the app will increase indefinitely.

          Therefore I think with this change we can no longer ignore CONTAINER_ALLOCATED or CONTAINER_FINISHED events as is currently done in a number of states.

          There is the option of forcing the container list to be empty when we enter terminal states. This may help cover some leak scenarios but isn't totally correct from a bookkeeping standpoint if those containers are technically still active. Not an issue in the short-term because when an AM attempt dies all the active containers of the app are killed immediately afterwards. However I could see that potentially changing in the future for long-lived YARN apps (otherwise AM becomes SPoF for the whole app).

          Show
          Jason Lowe added a comment - Thanks for updating the patch , Andrey. The changes look good, and I think we're close. However I spent a bit more time thinking about the leak case, and there's at least one more place where we can leak the bookkeeping for containers in RMAppAttemptImpl. If an AM crashes while tasks are running then we'll get a CONTAINER_FINISHED event, clean up the AM container and move the app attempt to the FAILED state. However in the FAILED state we ignore any container events, and subsequent container events for tasks that are killed off after the AM attempt dies will be ignored. That means we leak the bookkeeping for some containers and the resource utilization of the app will increase indefinitely. Therefore I think with this change we can no longer ignore CONTAINER_ALLOCATED or CONTAINER_FINISHED events as is currently done in a number of states. There is the option of forcing the container list to be empty when we enter terminal states. This may help cover some leak scenarios but isn't totally correct from a bookkeeping standpoint if those containers are technically still active. Not an issue in the short-term because when an AM attempt dies all the active containers of the app are killed immediately afterwards. However I could see that potentially changing in the future for long-lived YARN apps (otherwise AM becomes SPoF for the whole app).
          Hide
          Andrey Klochkov added a comment -

          With the 1st option it's not clear how to implement a protection from leaks. There's no event which can be used to check for leaks in that case. At the same time currently Yarn behavior does not support containers surviving after AM is finished, so the 2nd option is acceptable. This may need to be changed when there'll be support for long-lived apps and attempts which stay alive after AM is stopped.

          Attaching a patch which implements option #2 and adds a test for it.

          Show
          Andrey Klochkov added a comment - With the 1st option it's not clear how to implement a protection from leaks. There's no event which can be used to check for leaks in that case. At the same time currently Yarn behavior does not support containers surviving after AM is finished, so the 2nd option is acceptable. This may need to be changed when there'll be support for long-lived apps and attempts which stay alive after AM is stopped. Attaching a patch which implements option #2 and adds a test for it.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12607895/YARN-415--n6.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2161//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2161//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12607895/YARN-415--n6.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 7 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2161//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2161//console This message is automatically generated.
          Hide
          Jason Lowe added a comment -

          Thanks for the update Andrey. This change should resolve my concerns about the running container leaks, but there are some points that need to be addressed with respect to logging in cleanupRunningContainers:

          • If an application with many containers in-flight simply unregisters and exits expecting the RM to clean up the mess or the application simply crashes, we're going to log a lot of messages for all those containers. Currently the RM kills all current containers of an application already, so we're talking about being incorrect on the order of a few milliseconds for a sane RM. I think this should be an INFO rather than a WARN. Also we probably want to log a single message per application, stating how many containers were affected rather than specific ones, since we don't currently expose container-specific metrics anyway.
          • There's a "new memSec" log message that appears to be a debugging artifact that was left in the patch
          Show
          Jason Lowe added a comment - Thanks for the update Andrey. This change should resolve my concerns about the running container leaks, but there are some points that need to be addressed with respect to logging in cleanupRunningContainers: If an application with many containers in-flight simply unregisters and exits expecting the RM to clean up the mess or the application simply crashes, we're going to log a lot of messages for all those containers. Currently the RM kills all current containers of an application already, so we're talking about being incorrect on the order of a few milliseconds for a sane RM. I think this should be an INFO rather than a WARN. Also we probably want to log a single message per application, stating how many containers were affected rather than specific ones, since we don't currently expose container-specific metrics anyway. There's a "new memSec" log message that appears to be a debugging artifact that was left in the patch
          Hide
          Andrey Klochkov added a comment -

          Thanks Jason. Attaching a fixed patch.

          Show
          Andrey Klochkov added a comment - Thanks Jason. Attaching a fixed patch.
          Hide
          Kendall Thrapp added a comment -

          Thanks Andrey for implementing this. I'm looking forward to being able to use it. Just a reminder to also update the REST API docs (http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_API).

          Show
          Kendall Thrapp added a comment - Thanks Andrey for implementing this. I'm looking forward to being able to use it. Just a reminder to also update the REST API docs ( http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_API ).
          Hide
          Andrey Klochkov added a comment -

          Kendall, sure, I may update the Wiki as soon as the patch is committed. BTW how can I get write access to the Wiki?

          Show
          Andrey Klochkov added a comment - Kendall, sure, I may update the Wiki as soon as the patch is committed. BTW how can I get write access to the Wiki?
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12609031/YARN-415--n7.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2208//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2208//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12609031/YARN-415--n7.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 7 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2208//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2208//console This message is automatically generated.
          Hide
          Jason Lowe added a comment -

          The REST API doc change is a good catch, Kendall. The source for that doc is at hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm and should be updated as part of this patch.

          Show
          Jason Lowe added a comment - The REST API doc change is a good catch, Kendall. The source for that doc is at hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm and should be updated as part of this patch.
          Hide
          Andrey Klochkov added a comment -

          Adding changes in REST API docs to the patch.

          Show
          Andrey Klochkov added a comment - Adding changes in REST API docs to the patch.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12609068/YARN-415--n8.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2213//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2213//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12609068/YARN-415--n8.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 7 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2213//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2213//console This message is automatically generated.
          Hide
          Arun C Murthy added a comment -

          I'm sorry to come in late, I just did a cursory look.

          One question: Do we really need to track ResourceUsage for each Container? Can't we just add it up when a container finishes? Maybe I'm missing something? But, I'd like to not have a lot of per-container state if possible. Thanks.

          Show
          Arun C Murthy added a comment - I'm sorry to come in late, I just did a cursory look. One question: Do we really need to track ResourceUsage for each Container? Can't we just add it up when a container finishes? Maybe I'm missing something? But, I'd like to not have a lot of per-container state if possible. Thanks.
          Hide
          Andrey Klochkov added a comment -

          Arun, the idea is to have the stats being updated in real time while the app is running. Is there a way to get a list of running containers assigned to the app, with their start times, without tracking it explicitly?

          Show
          Andrey Klochkov added a comment - Arun, the idea is to have the stats being updated in real time while the app is running. Is there a way to get a list of running containers assigned to the app, with their start times, without tracking it explicitly?
          Hide
          Jason Lowe added a comment -

          It's not just a real-time issue, it's also a correctness issue. When a container finishes we need to know the time it was allocated. So regardless of whether we want to compute the usage in real-time, the start time of a container and its resource sizes need to be tracked somewhere in the RM.

          ResourceUsage is just a Resource plus a start time, and the Resource should be referencing the same object already referenced by the Container inside RMContainerImpl. To implement this feature we need to track the containers that are allocated/running (already being done by RMContainerImpl) and what time they started (which we are not currently doing and why ResourceUsage was created).

          There is the issue of the HashMap to map a container ID to its resource and start time. We could remove the need for this if we stored the container start time in RMContainerImpl and had a safe way to lookup containers for an application attempt. We can get the containers for an application via scheduler.getSchedulerAppInfo, and RMAppAttemptImpl already does this when generating an app report. However since RMAppAttemptImpl and the scheduler are running in separate threads, I could see the scheduler already removing the container before RMAppAttemptImpl received the container completion event and tried to lookup the container for usage calculation. Given the race, along with the fact that getSchedulerAppInfo is not necessarily cheap, it seems reasonable to have RMAppAttemptImpl track what it needs for running containers directly.

          Show
          Jason Lowe added a comment - It's not just a real-time issue, it's also a correctness issue. When a container finishes we need to know the time it was allocated. So regardless of whether we want to compute the usage in real-time, the start time of a container and its resource sizes need to be tracked somewhere in the RM. ResourceUsage is just a Resource plus a start time, and the Resource should be referencing the same object already referenced by the Container inside RMContainerImpl. To implement this feature we need to track the containers that are allocated/running (already being done by RMContainerImpl) and what time they started (which we are not currently doing and why ResourceUsage was created). There is the issue of the HashMap to map a container ID to its resource and start time. We could remove the need for this if we stored the container start time in RMContainerImpl and had a safe way to lookup containers for an application attempt. We can get the containers for an application via scheduler.getSchedulerAppInfo, and RMAppAttemptImpl already does this when generating an app report. However since RMAppAttemptImpl and the scheduler are running in separate threads, I could see the scheduler already removing the container before RMAppAttemptImpl received the container completion event and tried to lookup the container for usage calculation. Given the race, along with the fact that getSchedulerAppInfo is not necessarily cheap, it seems reasonable to have RMAppAttemptImpl track what it needs for running containers directly.
          Hide
          Sandy Ryza added a comment -

          However since RMAppAttemptImpl and the scheduler are running in separate threads, I could see the scheduler already removing the container before RMAppAttemptImpl received the container completion event and tried to lookup the container for usage calculation.

          The most accurate timestamp for a container start/end with respect to utilization/chargeback is when the scheduler allocates/releases it, as that's the moment that the resource becomes inaccessible/accessible. Is there a reason we need to set these with an asynchronous event after the scheduler has completed the operation? Why not move the calculation entirely into the scheduler (providing code that could be shared across all schedulers).

          Show
          Sandy Ryza added a comment - However since RMAppAttemptImpl and the scheduler are running in separate threads, I could see the scheduler already removing the container before RMAppAttemptImpl received the container completion event and tried to lookup the container for usage calculation. The most accurate timestamp for a container start/end with respect to utilization/chargeback is when the scheduler allocates/releases it, as that's the moment that the resource becomes inaccessible/accessible. Is there a reason we need to set these with an asynchronous event after the scheduler has completed the operation? Why not move the calculation entirely into the scheduler (providing code that could be shared across all schedulers).
          Hide
          Andrey Klochkov added a comment -

          IMO it makes sense to move this tracking into the scheduler, and in particular SchedulerApplication looks like a good place to have this logic. I'm wondering why SchedulerApplication has everything abstract, while it's descendants have a lot of same fields and code. Why isn't the common code placed into the SchedulerApplication itself? If I'm not missing anything here, I'd move all that code and this resources usage tracking just into SchedulerApplication. Please comment.

          Show
          Andrey Klochkov added a comment - IMO it makes sense to move this tracking into the scheduler, and in particular SchedulerApplication looks like a good place to have this logic. I'm wondering why SchedulerApplication has everything abstract, while it's descendants have a lot of same fields and code. Why isn't the common code placed into the SchedulerApplication itself? If I'm not missing anything here, I'd move all that code and this resources usage tracking just into SchedulerApplication. Please comment.
          Hide
          Sandy Ryza added a comment -

          I don't think there's any strong reason not to move a lot of the common code from FSSchedulerApp and FiCaSchedulerApp into SchedulerApplication. Filed YARN-1335 for this.

          Show
          Sandy Ryza added a comment - I don't think there's any strong reason not to move a lot of the common code from FSSchedulerApp and FiCaSchedulerApp into SchedulerApplication. Filed YARN-1335 for this.
          Hide
          Andrey Klochkov added a comment -

          Adding YARN-1335 as a dependency.

          Show
          Andrey Klochkov added a comment - Adding YARN-1335 as a dependency.
          Hide
          Andrey Klochkov added a comment -

          On a second thought, scheduler does not seem like a good place for this stats. It doesn't keep info on finished apps, so the tracking data will be gone as soon as an app is done, if the logic is placed in the scheduler. At the same time app attempts are kept in the RMContext until evicted, so usage stats can be pulled out of there by an external system that have persistence/reporting/etc.

          Show
          Andrey Klochkov added a comment - On a second thought, scheduler does not seem like a good place for this stats. It doesn't keep info on finished apps, so the tracking data will be gone as soon as an app is done, if the logic is placed in the scheduler. At the same time app attempts are kept in the RMContext until evicted, so usage stats can be pulled out of there by an external system that have persistence/reporting/etc.
          Hide
          Sandy Ryza added a comment -

          The scheduler can write to a tracking object that the RM passes it

          Show
          Sandy Ryza added a comment - The scheduler can write to a tracking object that the RM passes it
          Hide
          Jason Lowe added a comment -

          The most accurate timestamp for a container start/end with respect to utilization/chargeback is when the scheduler allocates/releases it, as that's the moment that the resource becomes inaccessible/accessible. Is there a reason we need to set these with an asynchronous event after the scheduler has completed the operation?

          The asynchronous event is generated in FiCaSchedulerApp.allocate which is already called by the various schedulers right when the container is allocated. Note that the event is timestamped when it is generated not when it is received, so the resource usage tracking has an accurate time of allocation when performing usage calculations. Also this is not generating any new events rather just using the events which already exist and are required to be sent.

          Why not move the calculation entirely into the scheduler (providing code that could be shared across all schedulers).

          I'm not seeing how the proposed change is going to make the calculation significantly more accurate unless the RM is lagging seriously far behind processing events, and the calculation will be accurate once the app completes as all events will be processed by that time.

          I'm not totally against moving the calculation into FiCaSchedulerApp or something similar, but it does create new issues to solve that we didn't have before like locking between the scheduler and RMAppAttemptImpl to share this resource usage report and making sure it doesn't disappear when the scheduler's done with the app.

          Show
          Jason Lowe added a comment - The most accurate timestamp for a container start/end with respect to utilization/chargeback is when the scheduler allocates/releases it, as that's the moment that the resource becomes inaccessible/accessible. Is there a reason we need to set these with an asynchronous event after the scheduler has completed the operation? The asynchronous event is generated in FiCaSchedulerApp.allocate which is already called by the various schedulers right when the container is allocated. Note that the event is timestamped when it is generated not when it is received, so the resource usage tracking has an accurate time of allocation when performing usage calculations. Also this is not generating any new events rather just using the events which already exist and are required to be sent. Why not move the calculation entirely into the scheduler (providing code that could be shared across all schedulers). I'm not seeing how the proposed change is going to make the calculation significantly more accurate unless the RM is lagging seriously far behind processing events, and the calculation will be accurate once the app completes as all events will be processed by that time. I'm not totally against moving the calculation into FiCaSchedulerApp or something similar, but it does create new issues to solve that we didn't have before like locking between the scheduler and RMAppAttemptImpl to share this resource usage report and making sure it doesn't disappear when the scheduler's done with the app.
          Hide
          Sandy Ryza added a comment -

          The asynchronous event is generated in FiCaSchedulerApp.allocate which is already called by the various schedulers right when the container is allocated.

          Didn't realize that. In that case, I don't have a strong opinion on whether it should be in the scheduler or not. Either way, it seems to me that there should only be one place in the RM that maps container IDs in an app attempt to containers (with their usages and start times), and we should use this map for calculating the usage of running containers. Currently this is in the scheduler, but we could move it into RMAppAttempt if that makes more sense and have the scheduler reference it there?

          Show
          Sandy Ryza added a comment - The asynchronous event is generated in FiCaSchedulerApp.allocate which is already called by the various schedulers right when the container is allocated. Didn't realize that. In that case, I don't have a strong opinion on whether it should be in the scheduler or not. Either way, it seems to me that there should only be one place in the RM that maps container IDs in an app attempt to containers (with their usages and start times), and we should use this map for calculating the usage of running containers. Currently this is in the scheduler, but we could move it into RMAppAttempt if that makes more sense and have the scheduler reference it there?
          Hide
          Andrey Klochkov added a comment -

          Updated the patch moving tracking logic into Scheduler:

          • AppSchedulingInfo tracks resources usage. Existing methods are reused and overall it seems more like the right place to have this logic in.
          • When app is finished and Scheduler evicts it from it's cache, it sends a new type of event (RMAppAttemptAppFinishedEvent) to the attempt, attaching usage stats to the event.
          • RMAppAttemptImpl test is modified accordingly
          • a new test is added to verify resources tracking in AppSchedulingInfo
          Show
          Andrey Klochkov added a comment - Updated the patch moving tracking logic into Scheduler: AppSchedulingInfo tracks resources usage. Existing methods are reused and overall it seems more like the right place to have this logic in. When app is finished and Scheduler evicts it from it's cache, it sends a new type of event (RMAppAttemptAppFinishedEvent) to the attempt, attaching usage stats to the event. RMAppAttemptImpl test is modified accordingly a new test is added to verify resources tracking in AppSchedulingInfo
          Hide
          Andrey Klochkov added a comment -

          This scheme has a downside that the stats would be incorrect between 2 events: 1) Scheduler evicting the app from the cache and sending an event and 2) RMAppAttemptImpl receiving the event and updating it's internal stats. The only idea I have is to add an additional roundtrip extending this schema to:
          1. When app is finished, Scheduler sends RMAppAttemptAppFinishedEvent instance and does not evict the app from the cache yet
          2. RMAppAttemptImpl receives the event, updates it's internal fields "finalMemorySeconds" and "finalVcoreSeconds" and sends a new type event to the Scheduler allowing it to evict the app.
          3. Scheduler gets the event and evicts the app.

          Thoughts?

          Show
          Andrey Klochkov added a comment - This scheme has a downside that the stats would be incorrect between 2 events: 1) Scheduler evicting the app from the cache and sending an event and 2) RMAppAttemptImpl receiving the event and updating it's internal stats. The only idea I have is to add an additional roundtrip extending this schema to: 1. When app is finished, Scheduler sends RMAppAttemptAppFinishedEvent instance and does not evict the app from the cache yet 2. RMAppAttemptImpl receives the event, updates it's internal fields "finalMemorySeconds" and "finalVcoreSeconds" and sends a new type event to the Scheduler allowing it to evict the app. 3. Scheduler gets the event and evicts the app. Thoughts?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12610399/YARN-415--n9.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2292//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2292//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2292//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610399/YARN-415--n9.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2292//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2292//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2292//console This message is automatically generated.
          Hide
          Jason Lowe added a comment -

          I haven't fully digested the latest patch yet, but here are some initial impressions:

          I believe Sandy's intention was to remove the need for a separate runningContainers map, but that map that still exists in the patch and has simply been moved from RMAppAttemptImpl to SchedulerAppInfo. This necessitated a new event and added a new race condition, so I'm not sure this is a better overall approach.

          To remove the need for a separate runningContainers map we need to reuse the place in the code where the schedulers are already tracking the active containers for an application, and that's in SchedulerApplication.liveContainers. We could extend RMContainer to add the ability to obtain an allocation start time, and now we can compute the resource consumption for the active containers in SchedulerApplication and roll them up into a usage total when the containers complete and are removed from liveContainers. Then at least we're eliminating an extra map to track active containers.

          As for the race condition, how about requiring schedulers to retain app attempts in their cache until signaled by RMAppAttemptImpl that it can be flushed? RMAppAttemptImpl already knows (eventually) when an application completes, and it can grab the latest app report with the rollup of resource usage from the scheduler, cache that usage locally into a total, then tell the scheduler via a new scheduler event that it can release the app attempt from its cache.

          Show
          Jason Lowe added a comment - I haven't fully digested the latest patch yet, but here are some initial impressions: I believe Sandy's intention was to remove the need for a separate runningContainers map, but that map that still exists in the patch and has simply been moved from RMAppAttemptImpl to SchedulerAppInfo. This necessitated a new event and added a new race condition, so I'm not sure this is a better overall approach. To remove the need for a separate runningContainers map we need to reuse the place in the code where the schedulers are already tracking the active containers for an application, and that's in SchedulerApplication.liveContainers. We could extend RMContainer to add the ability to obtain an allocation start time, and now we can compute the resource consumption for the active containers in SchedulerApplication and roll them up into a usage total when the containers complete and are removed from liveContainers. Then at least we're eliminating an extra map to track active containers. As for the race condition, how about requiring schedulers to retain app attempts in their cache until signaled by RMAppAttemptImpl that it can be flushed? RMAppAttemptImpl already knows (eventually) when an application completes, and it can grab the latest app report with the rollup of resource usage from the scheduler, cache that usage locally into a total, then tell the scheduler via a new scheduler event that it can release the app attempt from its cache.
          Hide
          Andrey Klochkov added a comment -

          Updating the patch:

          • got rid of the "runningContainers" map
          • refactored SchedulerApplication by pulling some of the methods from descendants (allocate, containerCompleted, unreserve), making most of the fields private and modifying "containerCompleted" to do resource usage tracking
          • made scheduler not evicting app immediately after the attempt is finished, but waiting for signal from RMAppAttemptImpl
          Show
          Andrey Klochkov added a comment - Updating the patch: got rid of the "runningContainers" map refactored SchedulerApplication by pulling some of the methods from descendants (allocate, containerCompleted, unreserve), making most of the fields private and modifying "containerCompleted" to do resource usage tracking made scheduler not evicting app immediately after the attempt is finished, but waiting for signal from RMAppAttemptImpl
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12610446/YARN-415--n10.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 10 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2294//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2294//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610446/YARN-415--n10.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 10 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2294//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2294//console This message is automatically generated.
          Hide
          Jason Lowe added a comment -

          Thanks, Andrey, for updating the patch to the new design. Further comments:

          • The patch no longer applies after YARN-891 and needs to be refreshed.
          • I think RMAppAttemptEventType.APP_FINISHED should be something like RMAppAttemptEventType.APP_SCHEDULER_STATS and include the final resource usage stats. This eliminates an extra getSchedulerApp call that the app attempt currently has to make, and that call is not cheap. It also helps reduce some confusion about what APP_FINISHED really means and why the scheduler is sending it.
          Show
          Jason Lowe added a comment - Thanks, Andrey, for updating the patch to the new design. Further comments: The patch no longer applies after YARN-891 and needs to be refreshed. I think RMAppAttemptEventType.APP_FINISHED should be something like RMAppAttemptEventType.APP_SCHEDULER_STATS and include the final resource usage stats. This eliminates an extra getSchedulerApp call that the app attempt currently has to make, and that call is not cheap. It also helps reduce some confusion about what APP_FINISHED really means and why the scheduler is sending it.
          Hide
          Junping Du added a comment -

          This work is very important to make YARN to serve as a public cloud service (in PaaS layer). Andrey, thanks for working on this!
          Any new progress after Jason's last comments?

          Show
          Junping Du added a comment - This work is very important to make YARN to serve as a public cloud service (in PaaS layer). Andrey, thanks for working on this! Any new progress after Jason's last comments?
          Hide
          Kendall Thrapp added a comment -

          Thanks Andrey for all your work on this! I'm looking forward to being able to use this. Any updates?

          Show
          Kendall Thrapp added a comment - Thanks Andrey for all your work on this! I'm looking forward to being able to use this. Any updates?
          Hide
          Andrey Klochkov added a comment -

          I'm updating the patch to make it applicable to current trunk. Going slowly but hope I'll finish this week.

          Show
          Andrey Klochkov added a comment - I'm updating the patch to make it applicable to current trunk. Going slowly but hope I'll finish this week.
          Hide
          Kendall Thrapp added a comment -

          Hi Andrey, any update on this? Thanks!

          Show
          Kendall Thrapp added a comment - Hi Andrey, any update on this? Thanks!
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12610446/YARN-415--n10.patch
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3612//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610446/YARN-415--n10.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3612//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12610446/YARN-415--n10.patch
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3613//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610446/YARN-415--n10.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3613//console This message is automatically generated.
          Hide
          Eric Payne added a comment -

          The Generic Application History server stores all of the information about containers that are needed to calculate memory seconds and vcore seconds. Right now, since the Generic Application Server is tied closely with the Timeline Server, this does not work on a secured cluster. Also, the information is only available via REST API right now, and there would need to be some scripting and parsing of the REST APIs to rolled up metrics for each app. So, I think this JIRA still would be very helpful and useful.

          FYI, On an unsecured cluster with the Generic Application History Server and the Timeline Server configured and running, the following REST APIs will give enough information about an app to calculate memory seconds and vcore seconds:

          Get list of app attempts for a specified appID

          curl --compressed -H "Accept: application/json" -X GET "http://<hostname>:<port>/ws/v1/applicationhistory/apps/<appID>/appattempts"

          For each app attempt, get all container info

          curl --compressed -H "Accept: application/json" -X GET "http://<hostname:<port>/ws/v1/applicationhistory/apps/<appID>/appattempts/<appAttemptID>/containers"

          Show
          Eric Payne added a comment - The Generic Application History server stores all of the information about containers that are needed to calculate memory seconds and vcore seconds. Right now, since the Generic Application Server is tied closely with the Timeline Server, this does not work on a secured cluster. Also, the information is only available via REST API right now, and there would need to be some scripting and parsing of the REST APIs to rolled up metrics for each app. So, I think this JIRA still would be very helpful and useful. FYI, On an unsecured cluster with the Generic Application History Server and the Timeline Server configured and running, the following REST APIs will give enough information about an app to calculate memory seconds and vcore seconds: Get list of app attempts for a specified appID curl --compressed -H "Accept: application/json" -X GET "http://<hostname>:<port>/ws/v1/applicationhistory/apps/<appID>/appattempts" For each app attempt, get all container info curl --compressed -H "Accept: application/json" -X GET "http://<hostname:<port>/ws/v1/applicationhistory/apps/<appID>/appattempts/<appAttemptID>/containers"
          Hide
          Eric Payne added a comment -

          Hi Andrey Klochkov. Thank you for the great work you did on this issue. I have refactored and upmerged your previous patch to conform to refactoring that has been done on the latest trunk and branch-2 source lines. Please let me know what you think.

          Show
          Eric Payne added a comment - Hi Andrey Klochkov . Thank you for the great work you did on this issue. I have refactored and upmerged your previous patch to conform to refactoring that has been done on the latest trunk and branch-2 source lines. Please let me know what you think.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12648154/YARN-415.201405311749.txt
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3897//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12648154/YARN-415.201405311749.txt against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3897//console This message is automatically generated.
          Hide
          Eric Payne added a comment -

          Not sure, but it looks like the build may be broken:
          https://builds.apache.org/job/PreCommit-YARN-Build/3897//console says:

          cp: cannot stat `/home/jenkins/buildSupport/lib/*': No such file or directory

          Show
          Eric Payne added a comment - Not sure, but it looks like the build may be broken: https://builds.apache.org/job/PreCommit-YARN-Build/3897//console says: cp: cannot stat `/home/jenkins/buildSupport/lib/*': No such file or directory
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12648183/YARN-415.201406031616.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.client.TestRMAdminCLI

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3900//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3900//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12648183/YARN-415.201406031616.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.TestRMAdminCLI +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3900//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3900//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12648183/YARN-415.201406031616.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.client.TestRMAdminCLI

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3901//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3901//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12648183/YARN-415.201406031616.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.TestRMAdminCLI +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3901//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3901//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12648183/YARN-415.201406031616.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.client.TestRMAdminCLI

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3902//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3902//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12648183/YARN-415.201406031616.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.TestRMAdminCLI +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3902//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3902//console This message is automatically generated.
          Hide
          Eric Payne added a comment -

          I don't think the TestRMAdminCLI failures are a result of this patch.

          Please check out https://builds.apache.org/job/PreCommit-YARN-Build/3901/, https://builds.apache.org/job/PreCommit-YARN-Build/3900/, etc.

          Show
          Eric Payne added a comment - I don't think the TestRMAdminCLI failures are a result of this patch. Please check out https://builds.apache.org/job/PreCommit-YARN-Build/3901/ , https://builds.apache.org/job/PreCommit-YARN-Build/3900/ , etc.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12648183/YARN-415.201406031616.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.client.TestRMAdminCLI

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3903//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3903//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12648183/YARN-415.201406031616.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.TestRMAdminCLI +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3903//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3903//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12648183/YARN-415.201406031616.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.client.TestRMAdminCLI

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3904//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3904//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12648183/YARN-415.201406031616.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.TestRMAdminCLI +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3904//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3904//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12648183/YARN-415.201406031616.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.client.TestRMAdminCLI

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3905//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3905//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12648183/YARN-415.201406031616.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.TestRMAdminCLI +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3905//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3905//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12648183/YARN-415.201406031616.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.client.TestRMAdminCLI

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3906//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3906//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12648183/YARN-415.201406031616.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.TestRMAdminCLI +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3906//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3906//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12648183/YARN-415.201406031616.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.client.TestRMAdminCLI

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3907//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3907//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12648183/YARN-415.201406031616.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.TestRMAdminCLI +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3907//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3907//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12648183/YARN-415.201406031616.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.client.TestRMAdminCLI

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3908//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3908//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12648183/YARN-415.201406031616.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.TestRMAdminCLI +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3908//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3908//console This message is automatically generated.
          Hide
          Andrey Klochkov added a comment -

          Eric Payne, thanks a lot for finishing this. Sorry I have never had enough time for it. My biggest question - and that's where I stuck when I tried to rebase the patch on the latest trunk - is about possibility of previous attempts being evicted from Scheduler by following attempts of the same app. Currently Scheduler stores just the latest attempt (in SchedulerApplication.currentAttempt), and actually when the next attempt starts, the whole SchedulerApplication instance is replaced with a new one. I'm not sure if in fact this may happen before the usage report for the previous attempt is retrieved by RM, but from the code it seems possible. If this is the case, then we need to somehow keep previous attempts of an app until reports for those are retrieved. I realize that with the current implementation this may not be done just by storing multiple attempts instead of just the current one in SchedulerApplication, because Scheduler creates new SchedulerApplication instance of every attempt. What do you think?

          Show
          Andrey Klochkov added a comment - Eric Payne , thanks a lot for finishing this. Sorry I have never had enough time for it. My biggest question - and that's where I stuck when I tried to rebase the patch on the latest trunk - is about possibility of previous attempts being evicted from Scheduler by following attempts of the same app. Currently Scheduler stores just the latest attempt (in SchedulerApplication.currentAttempt ), and actually when the next attempt starts, the whole SchedulerApplication instance is replaced with a new one. I'm not sure if in fact this may happen before the usage report for the previous attempt is retrieved by RM, but from the code it seems possible. If this is the case, then we need to somehow keep previous attempts of an app until reports for those are retrieved. I realize that with the current implementation this may not be done just by storing multiple attempts instead of just the current one in SchedulerApplication , because Scheduler creates new SchedulerApplication instance of every attempt. What do you think?
          Hide
          Eric Payne added a comment -

          Hi Andrey. The scheduler keeps track ofall attempts and when it is time to calculate the metrics, it will loop through all attempts for the report. Prior to sending the APP_ATTEMPT_REMOVE event, RMAppAttempImpl collects the merrics for the attempt. I've tested by manually killing the AM, causing a second attempt to start. This parch reports on both attempts.

          Show
          Eric Payne added a comment - Hi Andrey. The scheduler keeps track ofall attempts and when it is time to calculate the metrics, it will loop through all attempts for the report. Prior to sending the APP_ATTEMPT_REMOVE event, RMAppAttempImpl collects the merrics for the attempt. I've tested by manually killing the AM, causing a second attempt to start. This parch reports on both attempts.
          Hide
          Eric Payne added a comment -

          Upmerging again so patch will cleanly apply to both trunk and branch-2

          Show
          Eric Payne added a comment - Upmerging again so patch will cleanly apply to both trunk and branch-2
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12652731/YARN-415.201406262136.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4107//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4107//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652731/YARN-415.201406262136.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4107//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4107//console This message is automatically generated.
          Hide
          Eric Payne added a comment -

          Test failures for TestRMApplicationHistoryWriter predate this patch.

          Show
          Eric Payne added a comment - Test failures for TestRMApplicationHistoryWriter predate this patch.
          Hide
          Eric Payne added a comment -

          This patch has been updated to handle the case where the AM container dies, the running containers continue, and a new AM container continues to manage the running containers.

          Show
          Eric Payne added a comment - This patch has been updated to handle the case where the AM container dies, the running containers continue, and a new AM container continues to manage the running containers.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12654144/YARN-415.201407042037.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4205//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/4205//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4205//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12654144/YARN-415.201407042037.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4205//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/4205//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4205//console This message is automatically generated.
          Hide
          Wangda Tan added a comment -

          Hi Eric,
          Thanks for the patch, I think it's very important for YARN. A patch of mine YARN-2181 has similar prospect (tracking preemption info of app attempt), but has different approach. I just looked at your patch, some comments,

          YARN-415 needs get two parts of info, one is from AppSchedulingInfo, for running containers. Another is from SchedulerApplicationAttempt/RMAppAttempt, for completed containers.

          YARN-2181 doesn't need running containers info, and YARN-2181 puts most logic in RMAppAttempt. To make YARN-2181/YARN-415 more consistent, we have two choice here: one is move major logic of YARN-2181 to SchedulerApplicationAttempt and copy them back to RMAppAttempt as payloads of RMAppAttemptAppFinishedEvent like what YARN-415 did. Another is move completed container resource usage calculation to RMAppAttempt like what YARN-2181 did.

          Personally, I perfer the latter one because:
          1) We don't need store completed-container-resource-usage info in two places (SchedulerApplication and RMAppAttempt).
          2) We don't need add extra complexity to SchedulerApplicationAttempt/AppSchedulingInfo which should more focus on scheduling based stuffs.
          3) Putting calculation of resource usage of completed container in RMAppAttempt#ContainerFinishedTransition should be straightforward, and doesn't need access any fields in AppSchedulingInfo/SchedulerAppAttempt. Accessing fields in scheduler from a different thread will block scheduler thread, which seems not good to me. We can't let an UI-based requirement block scheduler making decision.

          And for showing metrics on UI, YARN-415 uses the route: SchedulerAppReport->ApplicationReport->AppInfo, I found AppInfo.java:

          +        ApplicationReport report = 
          +            app.createAndGetApplicationReport(null, hasAccess);
          +        ApplicationResourceUsageReport usageReport = 
          +            report.getApplicationResourceUsageReport();
          +        this.memorySeconds = usageReport.getMemorySeconds();
          +        this.vcoreSeconds = usageReport.getVcoreSeconds();
          

          You can see AppInfo get a whole ApplicationReport from RMApp and only uses two fields, which is a waste.
          I would suggest to add resource usage fields to AppMetrics/AppAttemptMetrics, and AppInfo can leverage them to render UI like what YARN-2181 did.

          And some other reviews about YARN-415,
          1) It's better to add changes to SchedulerApplication instead of to add them AppSchedulingInfo. AppSchedulingInfo is a critical class for schedulers make decision, we should carefully put any fields for other purpose besides scheduling.
          2) We don't need create a ResourceUsage to track container resource/startTime, RMContainer already has fields:

          +  private static class ResourceUsage {
          +    private final Resource resource;
          +    private final long startTimeMillis;
          

          3) Addtional lock or concurrent data structure should be used when we need access fields in scheduler (like AppSchedulingInfo/SchedulerApplication) in another thread. There're some thread-safe problem like using HashMap to store running containers resource usage in AppSchedulingInfo.java:

          +  private final Map<ContainerId, ResourceUsage> runningContainersUsage = 
          +      new HashMap<ContainerId, ResourceUsage>();
          

          Does these comments make sense to you? Please feel free to let me know if you have any comments.

          Thanks,
          Wangda

          Show
          Wangda Tan added a comment - Hi Eric, Thanks for the patch, I think it's very important for YARN. A patch of mine YARN-2181 has similar prospect (tracking preemption info of app attempt), but has different approach. I just looked at your patch, some comments, YARN-415 needs get two parts of info, one is from AppSchedulingInfo, for running containers. Another is from SchedulerApplicationAttempt/RMAppAttempt, for completed containers. YARN-2181 doesn't need running containers info, and YARN-2181 puts most logic in RMAppAttempt. To make YARN-2181 / YARN-415 more consistent, we have two choice here: one is move major logic of YARN-2181 to SchedulerApplicationAttempt and copy them back to RMAppAttempt as payloads of RMAppAttemptAppFinishedEvent like what YARN-415 did. Another is move completed container resource usage calculation to RMAppAttempt like what YARN-2181 did. Personally, I perfer the latter one because: 1) We don't need store completed-container-resource-usage info in two places (SchedulerApplication and RMAppAttempt). 2) We don't need add extra complexity to SchedulerApplicationAttempt/AppSchedulingInfo which should more focus on scheduling based stuffs. 3) Putting calculation of resource usage of completed container in RMAppAttempt#ContainerFinishedTransition should be straightforward, and doesn't need access any fields in AppSchedulingInfo/SchedulerAppAttempt. Accessing fields in scheduler from a different thread will block scheduler thread, which seems not good to me. We can't let an UI-based requirement block scheduler making decision. And for showing metrics on UI, YARN-415 uses the route: SchedulerAppReport->ApplicationReport->AppInfo, I found AppInfo.java: + ApplicationReport report = + app.createAndGetApplicationReport( null , hasAccess); + ApplicationResourceUsageReport usageReport = + report.getApplicationResourceUsageReport(); + this .memorySeconds = usageReport.getMemorySeconds(); + this .vcoreSeconds = usageReport.getVcoreSeconds(); You can see AppInfo get a whole ApplicationReport from RMApp and only uses two fields, which is a waste. I would suggest to add resource usage fields to AppMetrics/AppAttemptMetrics, and AppInfo can leverage them to render UI like what YARN-2181 did. And some other reviews about YARN-415 , 1) It's better to add changes to SchedulerApplication instead of to add them AppSchedulingInfo. AppSchedulingInfo is a critical class for schedulers make decision, we should carefully put any fields for other purpose besides scheduling. 2) We don't need create a ResourceUsage to track container resource/startTime, RMContainer already has fields: + private static class ResourceUsage { + private final Resource resource; + private final long startTimeMillis; 3) Addtional lock or concurrent data structure should be used when we need access fields in scheduler (like AppSchedulingInfo/SchedulerApplication) in another thread. There're some thread-safe problem like using HashMap to store running containers resource usage in AppSchedulingInfo.java: + private final Map<ContainerId, ResourceUsage> runningContainersUsage = + new HashMap<ContainerId, ResourceUsage>(); Does these comments make sense to you? Please feel free to let me know if you have any comments. Thanks, Wangda
          Hide
          Eric Payne added a comment -

          This new patch addresses findbugs issues.

          Show
          Eric Payne added a comment - This new patch addresses findbugs issues.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12654334/YARN-415.201407071542.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4210//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4210//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12654334/YARN-415.201407071542.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4210//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4210//console This message is automatically generated.
          Hide
          Eric Payne added a comment -

          @wheeleast : Thank you very much for taking the time to review this patch.

          Can you please make sure you reviewed the latest patch? There were some old patches that contained changes to AppSchedulingInfo, but not the recent ones.

          Also, please keep in mind that YARN-415 needs to calculate resource usage for running applications as well as completed ones. To do this, it needs access to the live containers, which list is kept in the SchedulerApplicationAttempt object.

          Show
          Eric Payne added a comment - @wheeleast : Thank you very much for taking the time to review this patch. Can you please make sure you reviewed the latest patch? There were some old patches that contained changes to AppSchedulingInfo, but not the recent ones. Also, please keep in mind that YARN-415 needs to calculate resource usage for running applications as well as completed ones. To do this, it needs access to the live containers, which list is kept in the SchedulerApplicationAttempt object.
          Hide
          Eric Payne added a comment -

          Wangda Tan, Sorry, I don't think the previous post worked. Trying it again:

          Thank you very much for taking the time to review this patch.

          Can you please make sure you reviewed the latest patch? There were some old patches that contained changes to AppSchedulingInfo, but not the recent ones.

          Also, please keep in mind that YARN-415 needs to calculate resource usage for running applications as well as completed ones. To do this, it needs access to the live containers, which list is kept in the SchedulerApplicationAttempt object.

          Show
          Eric Payne added a comment - Wangda Tan , Sorry, I don't think the previous post worked. Trying it again: Thank you very much for taking the time to review this patch. Can you please make sure you reviewed the latest patch? There were some old patches that contained changes to AppSchedulingInfo, but not the recent ones. Also, please keep in mind that YARN-415 needs to calculate resource usage for running applications as well as completed ones. To do this, it needs access to the live containers, which list is kept in the SchedulerApplicationAttempt object.
          Hide
          Wangda Tan added a comment -

          Hi Eric Payne,
          Thanks for reply, I've looked at your patch again.

          Since this patch has similiar prospect comparing to YARN-2181, and YARN-2181 has been committed, I suggest we could modify YARN-415 this way to make it consistent with our current approach:

          1) Add memory utilization to RMAppMetrics/RMAppAttemptMetrics
          2) Keep running container resource utilization in SchedulerApplicationAttempt
          3) Move completed container resource calculation to RMContainerImpl#FinishTransition

          Does it make sense to you? Please feel free to let me know if you have any comments.

          Thanks,
          Wangda

          Show
          Wangda Tan added a comment - Hi Eric Payne , Thanks for reply, I've looked at your patch again. Since this patch has similiar prospect comparing to YARN-2181 , and YARN-2181 has been committed, I suggest we could modify YARN-415 this way to make it consistent with our current approach: 1) Add memory utilization to RMAppMetrics/RMAppAttemptMetrics 2) Keep running container resource utilization in SchedulerApplicationAttempt 3) Move completed container resource calculation to RMContainerImpl#FinishTransition Does it make sense to you? Please feel free to let me know if you have any comments. Thanks, Wangda
          Hide
          Eric Payne added a comment -

          Hi Wangda Tan.

          Thank you very much for reviewing my patch.

          I think I understand what you are suggesting. Please let me clarify:

          1) Add memory utilization to RMAppMetrics/RMAppAttemptMetrics

          Since every RMAppAttemptImpl object has a reference to an RMAppAttemptMetrics object, you are suggesting that I move the resource usage stats to RMAppAttemptMetrics. Also, when reporting on resource usage, use the reporting methods from RMAppAttempt and RMApp.

          2) Keep running container resource utilization in SchedulerApplicationAttempt

          The way the patch for YARN-415 is currently, it keeps resource usage stats for both running and finished containers in the SchedulerApplicationAttempt object. You're suggestion is to keep resource usage stats only for running containers.

          3) Move completed container resource calculation to RMContainerImpl#FinishTransition

          For completed containers, you are suggesting that the calculation be done for final resource usage stats within the RMContainerImpl#FinishTransition method and have that send the resource stats as a payload within the RMAppAttemptContainerFinishedEvent event. Then, when RMAppAttemptImpl receives the CONTAINER_FINISHED event, it would add the resource usage stats for the finished containers to those already collected within the RMAppAttemptMetrics object.

          Is that correct?

          Show
          Eric Payne added a comment - Hi Wangda Tan . Thank you very much for reviewing my patch. I think I understand what you are suggesting. Please let me clarify: 1) Add memory utilization to RMAppMetrics/RMAppAttemptMetrics Since every RMAppAttemptImpl object has a reference to an RMAppAttemptMetrics object, you are suggesting that I move the resource usage stats to RMAppAttemptMetrics. Also, when reporting on resource usage, use the reporting methods from RMAppAttempt and RMApp. 2) Keep running container resource utilization in SchedulerApplicationAttempt The way the patch for YARN-415 is currently, it keeps resource usage stats for both running and finished containers in the SchedulerApplicationAttempt object. You're suggestion is to keep resource usage stats only for running containers. 3) Move completed container resource calculation to RMContainerImpl#FinishTransition For completed containers, you are suggesting that the calculation be done for final resource usage stats within the RMContainerImpl#FinishTransition method and have that send the resource stats as a payload within the RMAppAttemptContainerFinishedEvent event. Then, when RMAppAttemptImpl receives the CONTAINER_FINISHED event, it would add the resource usage stats for the finished containers to those already collected within the RMAppAttemptMetrics object. Is that correct?
          Hide
          Wangda Tan added a comment -

          Hi Eric Payne,

          Since every RMAppAttemptImpl object has a reference to an RMAppAttemptMetrics object, you are suggesting that I move the resource usage stats to RMAppAttemptMetrics.

          Yes

          Also, when reporting on resource usage, use the reporting methods from RMAppAttempt and RMApp.

          I'm not quite sure about what's the "reporting methods", it should be "getRMAppAttemptMetrics" in attempt and "getRMAppMetrics" in app.

          You're suggestion is to keep resource usage stats only for running containers.

          Yes

          For completed containers, you are suggesting that the calculation be done for final resource usage stats within the RMContainerImpl#FinishTransition method and have that send the resource stats as a payload within the RMAppAttemptC ...

          No, you can update current trunk code, and check RMContainerImpl#FinishedTransition#updateMetricsIfPreempted, you can change the "updateMetricsIfPreempted" to something like updateAttemptMetrics. And create a new method in RMAppAttemptMetrics, like "updateResourceUtilization". The benefit of doing this is you don need send payload to RMAppAttempt, all you needed information should be existed in RMContainer.

          Do them make sense to you? Please feel free to let me know if you have any questions.

          Thanks,
          Wangda

          Show
          Wangda Tan added a comment - Hi Eric Payne , Since every RMAppAttemptImpl object has a reference to an RMAppAttemptMetrics object, you are suggesting that I move the resource usage stats to RMAppAttemptMetrics. Yes Also, when reporting on resource usage, use the reporting methods from RMAppAttempt and RMApp. I'm not quite sure about what's the "reporting methods", it should be "getRMAppAttemptMetrics" in attempt and "getRMAppMetrics" in app. You're suggestion is to keep resource usage stats only for running containers. Yes For completed containers, you are suggesting that the calculation be done for final resource usage stats within the RMContainerImpl#FinishTransition method and have that send the resource stats as a payload within the RMAppAttemptC ... No, you can update current trunk code, and check RMContainerImpl#FinishedTransition#updateMetricsIfPreempted, you can change the "updateMetricsIfPreempted" to something like updateAttemptMetrics. And create a new method in RMAppAttemptMetrics, like "updateResourceUtilization". The benefit of doing this is you don need send payload to RMAppAttempt, all you needed information should be existed in RMContainer. Do them make sense to you? Please feel free to let me know if you have any questions. Thanks, Wangda
          Hide
          Eric Payne added a comment -

          Thanks Wangda Tan]

          No, you can update current trunk code, and check RMContainerImpl#FinishedTransition#updateMetricsIfPreempted, you can change the "updateMetricsIfPreempted" to something like updateAttemptMetrics. And create a new method in RMAppAttemptMetrics, like "updateResourceUtilization". The benefit of doing this is you don need send payload to RMAppAttempt, all you needed information should be existed in RMContainer.

          updateMetricsIfPreempted() is using the current attempt. Is there a way to get the RMAppAttempt object for completed attempts. IIUC, there are races where there is no running attempt, such as when an attempt dies after other containers have started and then the app itself dies or is killed. Also, in the case of "work-preserving restart," the appattempt could die and it's child containers could be assigned to the second appattempt,

          I have included a new patch that adds the payload to the CONTAINER_FINISHED event, which is sent to the appropriate RMAppAttempt. The RMAppAttempt then will keep track of it's own stats, even after the container for that appattempt has finished.

          Show
          Eric Payne added a comment - Thanks Wangda Tan ] No, you can update current trunk code, and check RMContainerImpl#FinishedTransition#updateMetricsIfPreempted, you can change the "updateMetricsIfPreempted" to something like updateAttemptMetrics. And create a new method in RMAppAttemptMetrics, like "updateResourceUtilization". The benefit of doing this is you don need send payload to RMAppAttempt, all you needed information should be existed in RMContainer. updateMetricsIfPreempted() is using the current attempt. Is there a way to get the RMAppAttempt object for completed attempts. IIUC, there are races where there is no running attempt, such as when an attempt dies after other containers have started and then the app itself dies or is killed. Also, in the case of "work-preserving restart," the appattempt could die and it's child containers could be assigned to the second appattempt, I have included a new patch that adds the payload to the CONTAINER_FINISHED event, which is sent to the appropriate RMAppAttempt. The RMAppAttempt then will keep track of it's own stats, even after the container for that appattempt has finished.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12656291/YARN-415.201407171553.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.util.TestFSDownload
          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices
          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched
          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens
          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes
          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4345//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4345//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12656291/YARN-415.201407171553.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 7 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.util.TestFSDownload org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4345//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4345//console This message is automatically generated.
          Hide
          Eric Payne added a comment -

          Thank you, Wangda Tan

          No, you can update current trunk code, and check RMContainerImpl#FinishedTransition#updateMetricsIfPreempted, you can change the "updateMetricsIfPreempted" to something like updateAttemptMetrics. And create a new method in RMAppAttemptMetrics, like "updateResourceUtilization". The benefit of doing this is you don need send payload to RMAppAttempt, all you needed information should be existed in RMContainer.

          I see that I can use RMApp#getRMAppAttempt to get the attempt that belongs to the container, so your suggestion will work for this use case. This is a cleaner solution.

          I have updated the patch with your suggestions. I am still looking into the test problems.

          Show
          Eric Payne added a comment - Thank you, Wangda Tan No, you can update current trunk code, and check RMContainerImpl#FinishedTransition#updateMetricsIfPreempted, you can change the "updateMetricsIfPreempted" to something like updateAttemptMetrics. And create a new method in RMAppAttemptMetrics, like "updateResourceUtilization". The benefit of doing this is you don need send payload to RMAppAttempt, all you needed information should be existed in RMContainer. I see that I can use RMApp#getRMAppAttempt to get the attempt that belongs to the container, so your suggestion will work for this use case. This is a cleaner solution. I have updated the patch with your suggestions. I am still looking into the test problems.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12656361/YARN-415.201407172144.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.util.TestFSDownload

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4351//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4351//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12656361/YARN-415.201407172144.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.util.TestFSDownload +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4351//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4351//console This message is automatically generated.
          Hide
          Wangda Tan added a comment -

          Hi Eric Payne,
          Thanks for updating your patch, the failed test case should be irrelevant to your changes, it is tracked by YARN-2270.
          Reviewing..

          Thanks,
          Wangda

          Show
          Wangda Tan added a comment - Hi Eric Payne , Thanks for updating your patch, the failed test case should be irrelevant to your changes, it is tracked by YARN-2270 . Reviewing.. Thanks, Wangda
          Hide
          Wangda Tan added a comment -

          Hi Eric Payne,
          I've spent some time to review and think about the JIRA. I have a

          1. Revert changes of SchedulerAppReport, we already have changed ApplicationResourceUsageReport, and memory utilization should be a part of resource usage report.

          2. Remove getMemory(VCore)Seconds from RMAppAttempt, modify RMAppAttemptMetrics#getFinishedMemory(VCore)Seconds to return completed+running resource utilization.

          3. put

                   ._("Resources:",
                      String.format("%d MB-seconds, %d vcore-seconds", 
                          app.getMemorySeconds(), app.getVcoreSeconds()))
          

          from "Application Overview" to "Application Metrics", and rename it to "Resource Seconds". It should be considered as a part of application metrics instead of overview.

          4. Change finishedMemory/VCoreSeconds to AtomicLong in RMAppAttemptMetrics to make it can be efficiently accessed by multi-thread.

          5. I think it's better to add a new method in SchedulerApplicationAttempt like getMemoryUtilization, which will only return memory/cpu seconds. We do this to prevent locking scheduling thread when showing application metrics on web UI.
          getMemoryUtilization will be used by RMAppAttemptMetrics#getFinishedMemory(VCore)Seconds to return completed+running resource utilization. And used by SchedulerApplicationAttempt#getResourceUsageReport as well.

          The MemoryUtilization class may contain two fields: runningContainerMemory(VCore)Seconds.

          6. Since compute running container resource utilization is not O(1), we need scan all containers under an application. I think it's better to cache a previous compute result, and it will be recomputed after several seconds (maybe 1-3 seconds should be enough) elapsed.

          And you can modify SchedulerApplicationAttempt#liveContainers to be a ConcurrentHashMap. With #6, get memory utilization to show metrics on web UI will not lock scheduling thread at all.

          Please let me know if you have any comments here,

          Thanks,
          Wangda

          Show
          Wangda Tan added a comment - Hi Eric Payne , I've spent some time to review and think about the JIRA. I have a 1. Revert changes of SchedulerAppReport, we already have changed ApplicationResourceUsageReport, and memory utilization should be a part of resource usage report. 2. Remove getMemory(VCore)Seconds from RMAppAttempt, modify RMAppAttemptMetrics#getFinishedMemory(VCore)Seconds to return completed+running resource utilization. 3. put ._( "Resources:" , String .format( "%d MB-seconds, %d vcore-seconds" , app.getMemorySeconds(), app.getVcoreSeconds())) from "Application Overview" to "Application Metrics", and rename it to "Resource Seconds". It should be considered as a part of application metrics instead of overview. 4. Change finishedMemory/VCoreSeconds to AtomicLong in RMAppAttemptMetrics to make it can be efficiently accessed by multi-thread. 5. I think it's better to add a new method in SchedulerApplicationAttempt like getMemoryUtilization, which will only return memory/cpu seconds. We do this to prevent locking scheduling thread when showing application metrics on web UI. getMemoryUtilization will be used by RMAppAttemptMetrics#getFinishedMemory(VCore)Seconds to return completed+running resource utilization. And used by SchedulerApplicationAttempt#getResourceUsageReport as well. The MemoryUtilization class may contain two fields: runningContainerMemory(VCore)Seconds. 6. Since compute running container resource utilization is not O(1), we need scan all containers under an application. I think it's better to cache a previous compute result, and it will be recomputed after several seconds (maybe 1-3 seconds should be enough) elapsed. And you can modify SchedulerApplicationAttempt#liveContainers to be a ConcurrentHashMap. With #6, get memory utilization to show metrics on web UI will not lock scheduling thread at all. Please let me know if you have any comments here, Thanks, Wangda
          Hide
          Eric Payne added a comment -

          5. I think it's better to add a new method in SchedulerApplicationAttempt like getMemoryUtilization, which will only return memory/cpu seconds. We do this to prevent locking scheduling thread when showing application metrics on web UI.
          getMemoryUtilization will be used by RMAppAttemptMetrics#getFinishedMemory(VCore)Seconds to return completed+running resource utilization. And used by SchedulerApplicationAttempt#getResourceUsageReport as well.

          The MemoryUtilization class may contain two fields: runningContainerMemory(VCore)Seconds

          Wangda Tan, Thank you for your thorough analysis of this patch and for your detailed suggestions.

          I am working through them, and I think they are pretty clear, but this one is a little confusing to me. If I understand correctly, suggestion number 5 is to create SchedulerApplicationAttempt#getMemoryUtilization to be called from both SchedulerApplicationAttempt#getResourceUsageReport as well as RMAppAttemptMetrics#getFinishedMemory(VCore)Seconds.

          Is that correct? If so, I have a couple of questions:

          • RMAppAttempt can access the scheduler via the 'scheduler' variable, but that is of type YarnScheduler, which does not have all of the interfaces available that AbstractYarnScheduler has. Are you suggesting that I add the getMemoryUtilization method to the YarnScheduler interface? Or, are you suggesting that the RMAppAttempt#scheduler variable be cast-ed to AbstractYarnScheduler? Or, am I missing the point?
          • When you say that a new class should be added called MemoryUtilization to be passed back to SchedulerApplicationAttempt#getResourceUsageReport, are you suggesting that that same structure should be added to ApplicationResourceUsageReport as a class variable in place of the current 'long memorySeconds' and 'long vcoreSeconds'? If so, I am a little reluctant to do that, since that structure would have to be passed across the protobuf interface to the client. It's possible, but seems riskier than just adding 2 longs to the API.

          Thank you very much.
          Eric Payne

          Show
          Eric Payne added a comment - 5. I think it's better to add a new method in SchedulerApplicationAttempt like getMemoryUtilization, which will only return memory/cpu seconds. We do this to prevent locking scheduling thread when showing application metrics on web UI. getMemoryUtilization will be used by RMAppAttemptMetrics#getFinishedMemory(VCore)Seconds to return completed+running resource utilization. And used by SchedulerApplicationAttempt#getResourceUsageReport as well. The MemoryUtilization class may contain two fields: runningContainerMemory(VCore)Seconds Wangda Tan , Thank you for your thorough analysis of this patch and for your detailed suggestions. I am working through them, and I think they are pretty clear, but this one is a little confusing to me. If I understand correctly, suggestion number 5 is to create SchedulerApplicationAttempt#getMemoryUtilization to be called from both SchedulerApplicationAttempt#getResourceUsageReport as well as RMAppAttemptMetrics#getFinishedMemory(VCore)Seconds. Is that correct? If so, I have a couple of questions: RMAppAttempt can access the scheduler via the 'scheduler' variable, but that is of type YarnScheduler, which does not have all of the interfaces available that AbstractYarnScheduler has. Are you suggesting that I add the getMemoryUtilization method to the YarnScheduler interface? Or, are you suggesting that the RMAppAttempt#scheduler variable be cast-ed to AbstractYarnScheduler? Or, am I missing the point? When you say that a new class should be added called MemoryUtilization to be passed back to SchedulerApplicationAttempt#getResourceUsageReport, are you suggesting that that same structure should be added to ApplicationResourceUsageReport as a class variable in place of the current 'long memorySeconds' and 'long vcoreSeconds'? If so, I am a little reluctant to do that, since that structure would have to be passed across the protobuf interface to the client. It's possible, but seems riskier than just adding 2 longs to the API. Thank you very much. Eric Payne
          Hide
          Wangda Tan added a comment -

          Hi Eric Payne,
          Thanks for your reply, I think following is completely make sense,

          RMAppAttempt can access the scheduler via the 'scheduler' variable, but that is of type YarnScheduler, which does not have all of the interfaces available that AbstractYarnScheduler has. Are you suggesting that I add the getMemoryUtilization method to the YarnScheduler interface? Or, are you suggesting that the RMAppAttempt#scheduler variable be cast-ed to AbstractYarnScheduler? Or, am I missing the point?

          RMAppAttempt should only hold reference to YarnScheduler, we shouldn't change that behavior, or user may fail when switch to a scheduler doesn't inherit from AbstractYarnScheduler.
          So according to this, I would like to suggest ignore #5, even if getAppAttemptMetrics may block scheduler thread. We can think about how to improve this in a future separated task.

          Thanks,
          Wangda

          Show
          Wangda Tan added a comment - Hi Eric Payne , Thanks for your reply, I think following is completely make sense, RMAppAttempt can access the scheduler via the 'scheduler' variable, but that is of type YarnScheduler, which does not have all of the interfaces available that AbstractYarnScheduler has. Are you suggesting that I add the getMemoryUtilization method to the YarnScheduler interface? Or, are you suggesting that the RMAppAttempt#scheduler variable be cast-ed to AbstractYarnScheduler? Or, am I missing the point? RMAppAttempt should only hold reference to YarnScheduler, we shouldn't change that behavior, or user may fail when switch to a scheduler doesn't inherit from AbstractYarnScheduler. So according to this, I would like to suggest ignore #5, even if getAppAttemptMetrics may block scheduler thread. We can think about how to improve this in a future separated task. Thanks, Wangda
          Hide
          Eric Payne added a comment -

          Wangda Tan, Thank you for your reply.

          I have implemented the following changes with the current patch.

          1. Revert changes of SchedulerAppReport, we already have changed ApplicationResourceUsageReport, and memory utilization should be a part of resource usage report.

          Changes to SchedulerAppReport have been reverted.

          2. Remove getMemory(VCore)Seconds from RMAppAttempt, modify RMAppAttemptMetrics#getFinishedMemory(VCore)Seconds to return completed+running resource utilization.

          I have removed getters and setters from RMAppAttempt and added RMAppAttemptMetrics#getResourceUtilization, which returns a single ResourceUtilization instance that contains both memorySeconds and vcoreSeconds for the appAttempt. These include both finished and running statistics IF the appAttempt is ALSO the current attempt. If not, it only includes the finished statistics.

          3. put

                   ._("Resources:",
                      String.format("%d MB-seconds, %d vcore-seconds", 
                          app.getMemorySeconds(), app.getVcoreSeconds()))
          

          from "Application Overview" to "Application Metrics", and rename it to "Resource Seconds". It should be considered as a part of application metrics instead of overview.

          Changes completed.

          4. Change finishedMemory/VCoreSeconds to AtomicLong in RMAppAttemptMetrics to make it can be efficiently accessed by multi-thread.

          Changes completed.

          5. I think it's better to add a new method in SchedulerApplicationAttempt like getMemoryUtilization, which will only return memory/cpu seconds. We do this to prevent locking scheduling thread when showing application metrics on web UI.
          getMemoryUtilization will be used by RMAppAttemptMetrics#getFinishedMemory(VCore)Seconds to return completed+running resource utilization. And used by SchedulerApplicationAttempt#getResourceUsageReport as well.

          The MemoryUtilization class may contain two fields: runningContainerMemory(VCore)Seconds.

          Added ResourceUtilization (instead of MemoryUtilization), but did not make the other changes as per comment:
          https://issues.apache.org/jira/browse/YARN-415?focusedCommentId=14071181&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14071181

          6. Since compute running container resource utilization is not O(1), we need scan all containers under an application. I think it's better to cache a previous compute result, and it will be recomputed after several seconds (maybe 1-3 seconds should be enough) elapsed.

          I added chached values in SchedulerApplicationAttempt for memorySeconds and vcoreSeconds that are updated when 1) a request is received to calculate these metrics, AND 2) it has been more than 3 seconds since the last request.

          One thing I did notice when these values are cached is that there is a race where containers can get counted twice:

          • RMAppAttemptMetrics#getResourceUtilization sends a request to calculate running containers, and container X is almost finished. RMAppAttemptMetrics#getResourceUtilization adds the finished values to the running values and returns ResourceUtilization.
          • Container X completes and its memorySeconds and vcoreSeconds are added to the finished values for appAttempt.
          • RMAppAttemptMetrics#getResourceUtilization makes another request before the 3 second interval, and the cached values are added to the finished values for appAttempt.
            Since both the cached values and the finished values contain metrics for Container X, those are double counted until 3 seconds elapses and the next RMAppAttemptMetrics#getResourceUtilization request is made.

          And you can modify SchedulerApplicationAttempt#liveContainers to be a ConcurrentHashMap. With #6, get memory utilization to show metrics on web UI will not lock scheduling thread at all.

          I am a little reluctant to modify the type of SchedulerApplicationAttempt#liveContainers as part of this JIRA. That seems like something that could be done separately.

          Show
          Eric Payne added a comment - Wangda Tan , Thank you for your reply. I have implemented the following changes with the current patch. 1. Revert changes of SchedulerAppReport, we already have changed ApplicationResourceUsageReport, and memory utilization should be a part of resource usage report. Changes to SchedulerAppReport have been reverted. 2. Remove getMemory(VCore)Seconds from RMAppAttempt, modify RMAppAttemptMetrics#getFinishedMemory(VCore)Seconds to return completed+running resource utilization. I have removed getters and setters from RMAppAttempt and added RMAppAttemptMetrics#getResourceUtilization, which returns a single ResourceUtilization instance that contains both memorySeconds and vcoreSeconds for the appAttempt. These include both finished and running statistics IF the appAttempt is ALSO the current attempt. If not, it only includes the finished statistics. 3. put ._( "Resources:" , String .format( "%d MB-seconds, %d vcore-seconds" , app.getMemorySeconds(), app.getVcoreSeconds())) from "Application Overview" to "Application Metrics", and rename it to "Resource Seconds". It should be considered as a part of application metrics instead of overview. Changes completed. 4. Change finishedMemory/VCoreSeconds to AtomicLong in RMAppAttemptMetrics to make it can be efficiently accessed by multi-thread. Changes completed. 5. I think it's better to add a new method in SchedulerApplicationAttempt like getMemoryUtilization, which will only return memory/cpu seconds. We do this to prevent locking scheduling thread when showing application metrics on web UI. getMemoryUtilization will be used by RMAppAttemptMetrics#getFinishedMemory(VCore)Seconds to return completed+running resource utilization. And used by SchedulerApplicationAttempt#getResourceUsageReport as well. The MemoryUtilization class may contain two fields: runningContainerMemory(VCore)Seconds. Added ResourceUtilization (instead of MemoryUtilization), but did not make the other changes as per comment: https://issues.apache.org/jira/browse/YARN-415?focusedCommentId=14071181&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14071181 6. Since compute running container resource utilization is not O(1), we need scan all containers under an application. I think it's better to cache a previous compute result, and it will be recomputed after several seconds (maybe 1-3 seconds should be enough) elapsed. I added chached values in SchedulerApplicationAttempt for memorySeconds and vcoreSeconds that are updated when 1) a request is received to calculate these metrics, AND 2) it has been more than 3 seconds since the last request. One thing I did notice when these values are cached is that there is a race where containers can get counted twice: RMAppAttemptMetrics#getResourceUtilization sends a request to calculate running containers, and container X is almost finished. RMAppAttemptMetrics#getResourceUtilization adds the finished values to the running values and returns ResourceUtilization. Container X completes and its memorySeconds and vcoreSeconds are added to the finished values for appAttempt. RMAppAttemptMetrics#getResourceUtilization makes another request before the 3 second interval, and the cached values are added to the finished values for appAttempt. Since both the cached values and the finished values contain metrics for Container X, those are double counted until 3 seconds elapses and the next RMAppAttemptMetrics#getResourceUtilization request is made. And you can modify SchedulerApplicationAttempt#liveContainers to be a ConcurrentHashMap. With #6, get memory utilization to show metrics on web UI will not lock scheduling thread at all. I am a little reluctant to modify the type of SchedulerApplicationAttempt#liveContainers as part of this JIRA. That seems like something that could be done separately.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12657484/YARN-415.201407232237.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.client.TestApplicationMasterServiceOnHA
          org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4408//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4408//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12657484/YARN-415.201407232237.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 7 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.TestApplicationMasterServiceOnHA org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4408//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4408//console This message is automatically generated.
          Hide
          Wangda Tan added a comment -

          Hi Eric,
          Thanks for updating your patch, I think now don't have major comments,

          Following are some minor comments:
          1) RMAppAttemptImpl.java
          1.1 There're some irrelevant line changes in RMAppAttemptImpl, could you please revert them? Like

                     RMAppAttemptEventType.RECOVER, new AttemptRecoveredTransition())
          -          
          +
          

          1.2 getResourceUtilization:

          +    if (rmApps != null) {
          +      RMApp app = rmApps.get(attemptId.getApplicationId());
          +      if (app != null) {
          

          I think the two cannot happen, we don't need check null to avoid potential bug here

          +          ApplicationResourceUsageReport appResUsageRpt =
          

          It's better to name it appResUsageReport since rpt is not a common abbr of report.

          2) RMContainerImpl.java
          2.1 updateAttemptMetrics:

                if (rmApps != null) {
                  RMApp rmApp = 
                      rmApps.get(container.getApplicationAttemptId().getApplicationId());
                  if (rmApp != null) {
          

          Again, I think the two null check is unnecessary

          3) SchedulerApplicationAttempt.java
          3.1 Some rename suggestions: (Please let me know if you have better idea)
          CACHE_MILLI -> MEMORY_UTILIZATION_CACHE_MILLISECONDS
          lastTime -> lastMemoryUtilizationUpdateTime
          cachedMemorySeconds -> lastMemorySeconds
          same for cachedVCore ...

          4) AppBlock.java
          Should we rename "Resource Seconds:" to "Resource Utilization" or something?

          5) Test
          5.1 I'm wondering if we need add a end to end test, since we changed RMAppAttempt/RMContainerImpl/SchedulerApplicationAttempt.
          It can consist submit an application, launch several containers, and finish application. And it's better to make the launched application contains several application attempt.
          While the application running, there're muliple containers running, and multiple containers finished. We can check if total resource utilization are expected.

          To your comments:
          1)

          One thing I did notice when these values are cached is that there is a race where containers can get counted twice:

          I think this can not be avoid, it should be a transient state and Jian He and I discussed about this long time before.
          But apparently, 3 sec cache make it not only a transient state. I suggest you can make "lastTime" in SchedulerApplicationAttempt protected. And in FiCaSchedulerApp/FSSchedulerApp, when remove container from liveContainer (in completedContainer method). You can set lastTime to a negtive value like -1, and next time when trying to get accumulated resource utilization, it will recompute all container utilization.

          2)

          I am a little reluctant to modify the type of SchedulerApplicationAttempt#liveContainers as part of this JIRA. That seems like something that could be done separately.

          I think that will be fine , because current getRunningResourceUtilization is called by getResourceUsageReport. And getResourceUsageReport is synchronized, no matter we changed liveContainers to concurrent map or not, we cannot solve the locking problem.
          I agree to enhance it in a separated JIRA in the future.

          Thanks,
          Wangda

          Show
          Wangda Tan added a comment - Hi Eric, Thanks for updating your patch, I think now don't have major comments, Following are some minor comments: 1) RMAppAttemptImpl.java 1.1 There're some irrelevant line changes in RMAppAttemptImpl, could you please revert them? Like RMAppAttemptEventType.RECOVER, new AttemptRecoveredTransition()) - + 1.2 getResourceUtilization: + if (rmApps != null ) { + RMApp app = rmApps.get(attemptId.getApplicationId()); + if (app != null ) { I think the two cannot happen, we don't need check null to avoid potential bug here + ApplicationResourceUsageReport appResUsageRpt = It's better to name it appResUsageReport since rpt is not a common abbr of report. 2) RMContainerImpl.java 2.1 updateAttemptMetrics: if (rmApps != null ) { RMApp rmApp = rmApps.get(container.getApplicationAttemptId().getApplicationId()); if (rmApp != null ) { Again, I think the two null check is unnecessary 3) SchedulerApplicationAttempt.java 3.1 Some rename suggestions: (Please let me know if you have better idea) CACHE_MILLI -> MEMORY_UTILIZATION_CACHE_MILLISECONDS lastTime -> lastMemoryUtilizationUpdateTime cachedMemorySeconds -> lastMemorySeconds same for cachedVCore ... 4) AppBlock.java Should we rename "Resource Seconds:" to "Resource Utilization" or something? 5) Test 5.1 I'm wondering if we need add a end to end test, since we changed RMAppAttempt/RMContainerImpl/SchedulerApplicationAttempt. It can consist submit an application, launch several containers, and finish application. And it's better to make the launched application contains several application attempt. While the application running, there're muliple containers running, and multiple containers finished. We can check if total resource utilization are expected. To your comments: 1) One thing I did notice when these values are cached is that there is a race where containers can get counted twice: I think this can not be avoid, it should be a transient state and Jian He and I discussed about this long time before. But apparently, 3 sec cache make it not only a transient state. I suggest you can make "lastTime" in SchedulerApplicationAttempt protected. And in FiCaSchedulerApp/FSSchedulerApp, when remove container from liveContainer (in completedContainer method). You can set lastTime to a negtive value like -1, and next time when trying to get accumulated resource utilization, it will recompute all container utilization. 2) I am a little reluctant to modify the type of SchedulerApplicationAttempt#liveContainers as part of this JIRA. That seems like something that could be done separately. I think that will be fine , because current getRunningResourceUtilization is called by getResourceUsageReport. And getResourceUsageReport is synchronized, no matter we changed liveContainers to concurrent map or not, we cannot solve the locking problem. I agree to enhance it in a separated JIRA in the future. Thanks, Wangda
          Hide
          Eric Payne added a comment -

          Wangda Tan

          Thank you very much for helping me review this.

          1) RMAppAttemptImpl.java
          1.1 There're some irrelevant line changes in RMAppAttemptImpl, could you please revert them? Like

                     RMAppAttemptEventType.RECOVER, new AttemptRecoveredTransition())
          -          
          +
          

          Changes completed.

          1.2 getResourceUtilization:

          +    if (rmApps != null) {
          +      RMApp app = rmApps.get(attemptId.getApplicationId());
          +      if (app != null) {
          

          I think the two cannot happen, we don't need check null to avoid potential bug here

          Changes completed.

          +          ApplicationResourceUsageReport appResUsageRpt =
          

          It's better to name it appResUsageReport since rpt is not a common abbr of report.

          Changes completed.

          2) RMContainerImpl.java
          2.1 updateAttemptMetrics:

                if (rmApps != null) {
                  RMApp rmApp = 
                      rmApps.get(container.getApplicationAttemptId().getApplicationId());
                  if (rmApp != null) {
          

          Again, I think the two null check is unnecessary

          I was able to remove the rmApps variable, but I had to leave the check for app != null because if I try to take that out, several unit tests would fail with NullPointerException. Even with removing the rmApps variable, I needed to change TestRMContainerImpl.java to mock rmContext.getRMApps().

          3) SchedulerApplicationAttempt.java
          3.1 Some rename suggestions: (Please let me know if you have better idea)
          CACHE_MILLI -> MEMORY_UTILIZATION_CACHE_MILLISECONDS
          lastTime -> lastMemoryUtilizationUpdateTime
          cachedMemorySeconds -> lastMemorySeconds
          same for cachedVCore ...

          Changes complete.

          4) AppBlock.java
          Should we rename "Resource Seconds:" to "Resource Utilization" or something?

          I changed it as you suggested. It feels like there should be something that would describe it better, but I can't think of anything right now.

          5) Test
          5.1 I'm wondering if we need add a end to end test, since we changed RMAppAttempt/RMContainerImpl/SchedulerApplicationAttempt.
          It can consist submit an application, launch several containers, and finish application. And it's better to make the launched application contains several application attempt.
          While the application running, there're muliple containers running, and multiple containers finished. We can check if total resource utilization are expected.

          I'm still working on the unit tests as you suggested, but I wanted to get the rest of the patch up first so you can look at it

          One thing I did notice when these values are cached is that there is a race where containers can get counted twice:

          I think this can not be avoid, it should be a transient state and Jian He and I discussed about this long time before.
          But apparently, 3 sec cache make it not only a transient state. I suggest you can make "lastTime" in SchedulerApplicationAttempt protected. And in FiCaSchedulerApp/FSSchedulerApp, when remove container from liveContainer (in completedContainer method). You can set lastTime to a negtive value like -1, and next time when trying to get accumulated resource utilization, it will recompute all container utilization.

          I made the changes to lastTime as you suggested. I agree that there will always be a possibility that the container will still be in the liveContainers list for a very brief period after the container has finished. With the cached values the way I had them before, this gap was noticeable in the resource calculations. Your suggested changes brought the race back down even for the cached values.

          Show
          Eric Payne added a comment - Wangda Tan Thank you very much for helping me review this. 1) RMAppAttemptImpl.java 1.1 There're some irrelevant line changes in RMAppAttemptImpl, could you please revert them? Like RMAppAttemptEventType.RECOVER, new AttemptRecoveredTransition()) - + Changes completed. 1.2 getResourceUtilization: + if (rmApps != null ) { + RMApp app = rmApps.get(attemptId.getApplicationId()); + if (app != null ) { I think the two cannot happen, we don't need check null to avoid potential bug here Changes completed. + ApplicationResourceUsageReport appResUsageRpt = It's better to name it appResUsageReport since rpt is not a common abbr of report. Changes completed. 2) RMContainerImpl.java 2.1 updateAttemptMetrics: if (rmApps != null ) { RMApp rmApp = rmApps.get(container.getApplicationAttemptId().getApplicationId()); if (rmApp != null ) { Again, I think the two null check is unnecessary I was able to remove the rmApps variable, but I had to leave the check for app != null because if I try to take that out, several unit tests would fail with NullPointerException. Even with removing the rmApps variable, I needed to change TestRMContainerImpl.java to mock rmContext.getRMApps(). 3) SchedulerApplicationAttempt.java 3.1 Some rename suggestions: (Please let me know if you have better idea) CACHE_MILLI -> MEMORY_UTILIZATION_CACHE_MILLISECONDS lastTime -> lastMemoryUtilizationUpdateTime cachedMemorySeconds -> lastMemorySeconds same for cachedVCore ... Changes complete. 4) AppBlock.java Should we rename "Resource Seconds:" to "Resource Utilization" or something? I changed it as you suggested. It feels like there should be something that would describe it better, but I can't think of anything right now. 5) Test 5.1 I'm wondering if we need add a end to end test, since we changed RMAppAttempt/RMContainerImpl/SchedulerApplicationAttempt. It can consist submit an application, launch several containers, and finish application. And it's better to make the launched application contains several application attempt. While the application running, there're muliple containers running, and multiple containers finished. We can check if total resource utilization are expected. I'm still working on the unit tests as you suggested, but I wanted to get the rest of the patch up first so you can look at it One thing I did notice when these values are cached is that there is a race where containers can get counted twice: I think this can not be avoid, it should be a transient state and Jian He and I discussed about this long time before. But apparently, 3 sec cache make it not only a transient state. I suggest you can make "lastTime" in SchedulerApplicationAttempt protected. And in FiCaSchedulerApp/FSSchedulerApp, when remove container from liveContainer (in completedContainer method). You can set lastTime to a negtive value like -1, and next time when trying to get accumulated resource utilization, it will recompute all container utilization. I made the changes to lastTime as you suggested. I agree that there will always be a possibility that the container will still be in the liveContainers list for a very brief period after the container has finished. With the cached values the way I had them before, this gap was noticeable in the resource calculations. Your suggested changes brought the race back down even for the cached values.
          Hide
          Wangda Tan added a comment -

          Hi Eric,
          Thanks for updating your patch again,

          To your comments,

          I was able to remove the rmApps variable, but I had to leave the check for app != null because if I try to take that out, several unit tests would fail with NullPointerException. Even with removing the rmApps variable, I needed to change TestRMContainerImpl.java to mock rmContext.getRMApps().

          I would like to suggest to fix such UTs instead of inserting some kernel code to make UT pass. I'm not sure about the effort of doing this, if the effort is still reasonable, we should do it.

          I'm still working on the unit tests as you suggested, but I wanted to get the rest of the patch up first so you can look at it

          No problem , I can give some reviews about your existing changes.

          I've reviewed some details of your patch, a very minor comments,
          ApplicationCLI.java

          +      appReportStr.print("\tResources used : ");
          

          We need change it to Resource Utilization as well?

          I think other the patch almost LGTM, looking forward your new patch contains an integration test.

          Thanks,
          Wangda

          Show
          Wangda Tan added a comment - Hi Eric, Thanks for updating your patch again, To your comments, I was able to remove the rmApps variable, but I had to leave the check for app != null because if I try to take that out, several unit tests would fail with NullPointerException. Even with removing the rmApps variable, I needed to change TestRMContainerImpl.java to mock rmContext.getRMApps(). I would like to suggest to fix such UTs instead of inserting some kernel code to make UT pass. I'm not sure about the effort of doing this, if the effort is still reasonable, we should do it. I'm still working on the unit tests as you suggested, but I wanted to get the rest of the patch up first so you can look at it No problem , I can give some reviews about your existing changes. I've reviewed some details of your patch, a very minor comments, ApplicationCLI.java + appReportStr.print( "\tResources used : " ); We need change it to Resource Utilization as well? I think other the patch almost LGTM, looking forward your new patch contains an integration test. Thanks, Wangda
          Hide
          Eric Payne added a comment -

          Wangda Tan
          Thanks for all of your help.
          How were you thinking an end-to-end test would work in the UT environment? In order to set a baseline and test that the containers ran for some predetermined and expected amount of time, wouldn't I need to somehow control the clock? Do you have any ideas on how to implement that?

          In the meantime, I have made the additional changes you suggested. Please see below:

          I was able to remove the rmApps variable, but I had to leave the check for app != null because if I try to take that out, several unit tests would fail with NullPointerException. Even with removing the rmApps variable, I needed to change TestRMContainerImpl.java to mock rmContext.getRMApps().

          I would like to suggest to fix such UTs instead of inserting some kernel code to make UT pass. I'm not sure about the effort of doing this, if the effort is still reasonable, we should do it.

          After some spy and mock magic, I was able to fix the unit tests so that the checks for "if != null" were not necessary.

           ApplicationCLI.java
          +      appReportStr.print("\tResources used : ");
          

          We need change it to Resource Utilization as well?

          Yes. I changed it to that.

          Show
          Eric Payne added a comment - Wangda Tan Thanks for all of your help. How were you thinking an end-to-end test would work in the UT environment? In order to set a baseline and test that the containers ran for some predetermined and expected amount of time, wouldn't I need to somehow control the clock? Do you have any ideas on how to implement that? In the meantime, I have made the additional changes you suggested. Please see below: I was able to remove the rmApps variable, but I had to leave the check for app != null because if I try to take that out, several unit tests would fail with NullPointerException. Even with removing the rmApps variable, I needed to change TestRMContainerImpl.java to mock rmContext.getRMApps(). I would like to suggest to fix such UTs instead of inserting some kernel code to make UT pass. I'm not sure about the effort of doing this, if the effort is still reasonable, we should do it. After some spy and mock magic, I was able to fix the unit tests so that the checks for "if != null" were not necessary. ApplicationCLI.java + appReportStr.print( "\tResources used : " ); We need change it to Resource Utilization as well? Yes. I changed it to that.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12658211/YARN-415.201407281816.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.client.TestResourceTrackerOnHA
          org.apache.hadoop.yarn.client.TestApplicationMasterServiceOnHA
          org.apache.hadoop.yarn.client.TestRMFailover
          org.apache.hadoop.yarn.client.api.impl.TestAMRMClient
          org.apache.hadoop.yarn.client.api.impl.TestNMClient
          org.apache.hadoop.yarn.client.TestGetGroups
          org.apache.hadoop.yarn.client.TestResourceManagerAdministrationProtocolPBClientImpl
          org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA
          org.apache.hadoop.yarn.client.cli.TestYarnCLI
          org.apache.hadoop.yarn.client.api.impl.TestYarnClient
          org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions
          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication
          org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication
          org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs
          org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens
          org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore
          org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs
          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
          org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
          org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService
          org.apache.hadoop.yarn.server.resourcemanager.TestRMHA
          org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4459//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4459//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658211/YARN-415.201407281816.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 9 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.TestResourceTrackerOnHA org.apache.hadoop.yarn.client.TestApplicationMasterServiceOnHA org.apache.hadoop.yarn.client.TestRMFailover org.apache.hadoop.yarn.client.api.impl.TestAMRMClient org.apache.hadoop.yarn.client.api.impl.TestNMClient org.apache.hadoop.yarn.client.TestGetGroups org.apache.hadoop.yarn.client.TestResourceManagerAdministrationProtocolPBClientImpl org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA org.apache.hadoop.yarn.client.cli.TestYarnCLI org.apache.hadoop.yarn.client.api.impl.TestYarnClient org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService org.apache.hadoop.yarn.server.resourcemanager.TestRMHA org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4459//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4459//console This message is automatically generated.
          Hide
          Wangda Tan added a comment -

          Hi Eric Payne,
          Thanks for updating your patch,
          For e2e test, I think we can do this way, you can refer to tests in TestRMRestart
          Using MockRM/MockAM can do such test, even though it's not a complete e2e test, but most logic are included in it. I suggest we could cover following cases:

          * Create an app, before submit AM, resource utilization should be 0
          * Submit AM, while AM running, we can get its resource utilization > 0
          * Allocate some container, and finish them, check total resource utilization
          * Finish application attempt, and check total resource utilization
          * Start a new application attempt, check if resource utilization of previous attempt is added to total resource utilization.
          * Check if resource utilization can be persist/read during RM restart
          

          Do you have any comments on this?

          Thanks,
          Wangda

          Show
          Wangda Tan added a comment - Hi Eric Payne , Thanks for updating your patch, For e2e test, I think we can do this way, you can refer to tests in TestRMRestart Using MockRM/MockAM can do such test, even though it's not a complete e2e test, but most logic are included in it. I suggest we could cover following cases: * Create an app, before submit AM, resource utilization should be 0 * Submit AM, while AM running, we can get its resource utilization > 0 * Allocate some container, and finish them, check total resource utilization * Finish application attempt, and check total resource utilization * Start a new application attempt, check if resource utilization of previous attempt is added to total resource utilization. * Check if resource utilization can be persist/read during RM restart Do you have any comments on this? Thanks, Wangda
          Hide
          Eric Payne added a comment -

          Wangda Tan, Thank you for your suggestions. I added an end-to-end unit test that covered most of your points. However, I had trouble setting up a test with more than one attempt for the same app. I think I covered the rest.

          Show
          Eric Payne added a comment - Wangda Tan , Thank you for your suggestions. I added an end-to-end unit test that covered most of your points. However, I had trouble setting up a test with more than one attempt for the same app. I think I covered the rest.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12660287/YARN-415.201408062232.txt
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4542//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12660287/YARN-415.201408062232.txt against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4542//console This message is automatically generated.
          Hide
          Wangda Tan added a comment -

          Hi Eric,
          Thanks for your hard working to add to add these e2e tests,

          However, I had trouble setting up a test with more than one attempt for the same app. I think I covered the rest.

          I suggest you can refer to TestAMRestart#testAMRestartWithExistingContainers as an example. Please let me know if you still have problem to set up multi-attempt test.

          Some minor suggestions:
          1)

          +  private final static File TEMP_DIR = new File(System.getProperty(
          +      "test.build.data", "/tmp"), "decommision");
          

          I think it didn't use by the test

          2)

          + Assert.assertTrue(YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS > 1);

          We don't need this assert

          3)

          +System.out.println("EEP 001");

          It's better to remove such personal debug-info.

          4)

          +    conf.setInt(YarnConfiguration.RM_AM_MAX_ATTEMPTS,
          +        YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS);
          

          It's better to put such logic to setup

          And please update your patch against trunk,

          Thanks,
          Wangda

          Show
          Wangda Tan added a comment - Hi Eric, Thanks for your hard working to add to add these e2e tests, However, I had trouble setting up a test with more than one attempt for the same app. I think I covered the rest. I suggest you can refer to TestAMRestart#testAMRestartWithExistingContainers as an example. Please let me know if you still have problem to set up multi-attempt test. Some minor suggestions: 1) + private final static File TEMP_DIR = new File( System .getProperty( + "test.build.data" , "/tmp" ), "decommision" ); I think it didn't use by the test 2) + Assert.assertTrue(YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS > 1); We don't need this assert 3) +System.out.println("EEP 001"); It's better to remove such personal debug-info. 4) + conf.setInt(YarnConfiguration.RM_AM_MAX_ATTEMPTS, + YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS); It's better to put such logic to setup And please update your patch against trunk, Thanks, Wangda
          Hide
          Eric Payne added a comment -

          Wangda Tan

          Thank you for all of your reviews.

          However, I had trouble setting up a test with more than one attempt for the same app. I think I covered the rest.

          I suggest you can refer to TestAMRestart#testAMRestartWithExistingContainers as an example. Please let me know if you still have problem to set up multi-attempt test.

          Thanks. I think I've got it now.

          +  private final static File TEMP_DIR = new File(System.getProperty(
          +      "test.build.data", "/tmp"), "decommision");
          

          I think it didn't use by the test

          Good catch. I removed the declaration.

          + Assert.assertTrue(YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS > 1);
          

          We don't need this assert

          I removed it.

          +System.out.println("EEP 001");
          

          It's better to remove such personal debug-info.

          I'm sorry. This was an unprofessional oversight. I removed them.

          +    conf.setInt(YarnConfiguration.RM_AM_MAX_ATTEMPTS,
          +        YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS);
          

          It's better to put such logic to setup

          I moved this code to the setup() method.

          And please update your patch against trunk,

          I did so. Thank you again.

          Show
          Eric Payne added a comment - Wangda Tan Thank you for all of your reviews. However, I had trouble setting up a test with more than one attempt for the same app. I think I covered the rest. I suggest you can refer to TestAMRestart#testAMRestartWithExistingContainers as an example. Please let me know if you still have problem to set up multi-attempt test. Thanks. I think I've got it now. + private final static File TEMP_DIR = new File( System .getProperty( + "test.build.data" , "/tmp" ), "decommision" ); I think it didn't use by the test Good catch. I removed the declaration. + Assert.assertTrue(YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS > 1); We don't need this assert I removed it. + System .out.println( "EEP 001" ); It's better to remove such personal debug-info. I'm sorry. This was an unprofessional oversight. I removed them. + conf.setInt(YarnConfiguration.RM_AM_MAX_ATTEMPTS, + YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS); It's better to put such logic to setup I moved this code to the setup() method. And please update your patch against trunk, I did so. Thank you again.
          Hide
          Wangda Tan added a comment -

          Hi Eric Payne,
          It's great to have so much cleanups in your new patch, I think it almost looks good to me,
          One minor comment:
          I found testUsageAfterAMRestartWithMultipleContainers and testUsageAfterAMRestartKeepContainers are very similar, could you find a way to create a common test method for them, only difference is passing a boolean keepContainer as parameter?

          Thanks,
          Wagda

          Show
          Wangda Tan added a comment - Hi Eric Payne , It's great to have so much cleanups in your new patch, I think it almost looks good to me, One minor comment: I found testUsageAfterAMRestartWithMultipleContainers and testUsageAfterAMRestartKeepContainers are very similar, could you find a way to create a common test method for them, only difference is passing a boolean keepContainer as parameter? Thanks, Wagda
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12660535/YARN-415.201408080204.txt
          against trunk revision .

          -1 patch. Trunk compilation may be broken.

          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4557//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12660535/YARN-415.201408080204.txt against trunk revision . -1 patch . Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4557//console This message is automatically generated.
          Hide
          Eric Payne added a comment -

          Wangda Tan,

          Thank you for your suggestions. I combined

          One minor comment:
          I found testUsageAfterAMRestartWithMultipleContainers and testUsageAfterAMRestartKeepContainers are very similar, could you find a way to create a common test method for them, only difference is passing a boolean keepContainer as parameter?

          I created a common method that both of these call.

          I also noticed that testUsageWithMultipleContainers was doing similar things to testUsageAfterRMRestart, so I combined them both into testUsageWithMultipleContainersAndRMRestart.

          Show
          Eric Payne added a comment - Wangda Tan , Thank you for your suggestions. I combined One minor comment: I found testUsageAfterAMRestartWithMultipleContainers and testUsageAfterAMRestartKeepContainers are very similar, could you find a way to create a common test method for them, only difference is passing a boolean keepContainer as parameter? I created a common method that both of these call. I also noticed that testUsageWithMultipleContainers was doing similar things to testUsageAfterRMRestart , so I combined them both into testUsageWithMultipleContainersAndRMRestart .
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12660822/YARN-415.201408092006.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 10 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.client.cli.TestYarnCLI
          org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4574//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4574//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12660822/YARN-415.201408092006.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 10 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.cli.TestYarnCLI org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4574//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4574//console This message is automatically generated.
          Hide
          Wangda Tan added a comment -

          Eric Payne,

          I created a common method that both of these call.

          Thanks!

          I also noticed that testUsageWithMultipleContainers was doing similar things to testUsageAfterRMRestart, so I combined them both into testUsageWithMultipleContainersAndRMRestart.

          Good catch,

          I don't have further comments, but would you please check test failure above?

          Thanks,
          Wangda

          Show
          Wangda Tan added a comment - Eric Payne , I created a common method that both of these call. Thanks! I also noticed that testUsageWithMultipleContainers was doing similar things to testUsageAfterRMRestart, so I combined them both into testUsageWithMultipleContainersAndRMRestart. Good catch, I don't have further comments, but would you please check test failure above? Thanks, Wangda
          Hide
          Wangda Tan added a comment -

          Jian He, would you like to take a look at it?

          Show
          Wangda Tan added a comment - Jian He , would you like to take a look at it?
          Hide
          Karthik Kambatla (Inactive) added a comment -

          Recently, I have been thinking about related improvements and would like to take a look at this patch. Can I get a couple of days to review? Thanks.

          Show
          Karthik Kambatla (Inactive) added a comment - Recently, I have been thinking about related improvements and would like to take a look at this patch. Can I get a couple of days to review? Thanks.
          Hide
          Jian He added a comment -

          Karthik, sure, thanks for reviewing the patch.

          Show
          Jian He added a comment - Karthik, sure, thanks for reviewing the patch.
          Hide
          Eric Payne added a comment -

          Thanks, Wangda Tan, Karthik Kambatla, and Jian He, for taking the time to review this patch.

          I am updating the patch to cleanly merge onto Trunk and Branch-2, and to fix an NPE that this patch was causing in TestRMAppTransitions.

          Show
          Eric Payne added a comment - Thanks, Wangda Tan , Karthik Kambatla , and Jian He , for taking the time to review this patch. I am updating the patch to cleanly merge onto Trunk and Branch-2, and to fix an NPE that this patch was causing in TestRMAppTransitions.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12661548/YARN-415.201408132109.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 11 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.client.cli.TestYarnCLI

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4617//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4617//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661548/YARN-415.201408132109.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 11 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.cli.TestYarnCLI +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4617//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4617//console This message is automatically generated.
          Hide
          Jian He added a comment -

          Hi Eric, thanks for working on the patch.
          some comments on the patch:

          • we can reuse the previous rmAttempt and resource object
                    RMAppAttempt rmAttempt = container.rmContext.getRMApps()
                               .get(container.getApplicationAttemptId().getApplicationId())
                               .getRMAppAttempt(container.getApplicationAttemptId());
            Resource resource = container.getContainer().getResource();
            
          • RMContainerImpl#updateAttemptMetrics: please fix the indent format
          • Can you please elaborate in what scenario we need the following extra check ? Also, “currentAttempt.getAppAttemptId().compareTo(attemptId) == 0”, we can use equals instead which looks more intuitive.
                // Only add in the running containers if this is the active attempt.
                RMAppAttempt currentAttempt = rmContext.getRMApps()
                               .get(attemptId.getApplicationId()).getCurrentAppAttempt();
                if (currentAttempt != null &&
                    currentAttempt.getAppAttemptId().compareTo(attemptId) == 0) {
                  ApplicationResourceUsageReport appResUsageReport = rmContext
                        .getScheduler().getAppResourceUsageReport(attemptId);
                  if (appResUsageReport != null) {
                    memorySeconds += appResUsageReport.getMemorySeconds();
                    vcoreSeconds += appResUsageReport.getVcoreSeconds();
                  }
                }
            
          • getFinishedMemorySeconds and getFinishedVcoreSeconds methods are not used.
          • For setFinishedVcoreSeconds and setFinishedMemorySeconds, we can just use updateResourceUtilization
          • RMStateStore#removeApplication: no need to calculate the memory utilization when removing the app. Saving some cost for the loop of attempts.
          Show
          Jian He added a comment - Hi Eric, thanks for working on the patch. some comments on the patch: we can reuse the previous rmAttempt and resource object RMAppAttempt rmAttempt = container.rmContext.getRMApps() .get(container.getApplicationAttemptId().getApplicationId()) .getRMAppAttempt(container.getApplicationAttemptId()); Resource resource = container.getContainer().getResource(); RMContainerImpl#updateAttemptMetrics: please fix the indent format Can you please elaborate in what scenario we need the following extra check ? Also, “currentAttempt.getAppAttemptId().compareTo(attemptId) == 0”, we can use equals instead which looks more intuitive. // Only add in the running containers if this is the active attempt. RMAppAttempt currentAttempt = rmContext.getRMApps() .get(attemptId.getApplicationId()).getCurrentAppAttempt(); if (currentAttempt != null && currentAttempt.getAppAttemptId().compareTo(attemptId) == 0) { ApplicationResourceUsageReport appResUsageReport = rmContext .getScheduler().getAppResourceUsageReport(attemptId); if (appResUsageReport != null ) { memorySeconds += appResUsageReport.getMemorySeconds(); vcoreSeconds += appResUsageReport.getVcoreSeconds(); } } getFinishedMemorySeconds and getFinishedVcoreSeconds methods are not used. For setFinishedVcoreSeconds and setFinishedMemorySeconds, we can just use updateResourceUtilization RMStateStore#removeApplication: no need to calculate the memory utilization when removing the app. Saving some cost for the loop of attempts.
          Hide
          Eric Payne added a comment -

          Jian He, Thank you very much for reviewing this patch.

          • we can reuse the previous rmAttempt and resource object
                    RMAppAttempt rmAttempt = container.rmContext.getRMApps()
                               .get(container.getApplicationAttemptId().getApplicationId())
                               .getRMAppAttempt(container.getApplicationAttemptId());
            Resource resource = container.getContainer().getResource();
            

          I will reuse the Resource object, but I'm not sure if I can reuse the RMAppAttempt object.

          In the following code snippet, the preemption path is always updating the attempt metrics for the current app attempt. In the chargeback (resource utilization metrics) path, that's not always what we want. Containers do not always complete before a current attempt dies and a new one is started. If this happens, the chargeback path should update the metrics for the first attempt, not the second one. The call to ...getRMAppAttempt(container.getApplicationAttemptId()) will always get the attempt that started the container.

          Now that I think about it, it seems like that is what we want in the preemption path as well.

          Wangda Tan, can you please comment? If the preemption path should update the preemption info for the attempt that started the finished container, then we can reuse the RMAppAttempt object for both paths.

                if (ContainerExitStatus.PREEMPTED == container.finishedStatus
                  .getExitStatus()) {
                  Resource resource = container.getContainer().getResource();
                  RMAppAttempt rmAttempt =
                      container.rmContext.getRMApps()
                        .get(container.getApplicationAttemptId().getApplicationId())
                        .getCurrentAppAttempt();
                  rmAttempt.getRMAppAttemptMetrics().updatePreemptionInfo(resource,
                    container);
                }
          
                  RMAppAttempt rmAttempt = container.rmContext.getRMApps()
                             .get(container.getApplicationAttemptId().getApplicationId())
                             .getRMAppAttempt(container.getApplicationAttemptId());
          
          Show
          Eric Payne added a comment - Jian He , Thank you very much for reviewing this patch. we can reuse the previous rmAttempt and resource object RMAppAttempt rmAttempt = container.rmContext.getRMApps() .get(container.getApplicationAttemptId().getApplicationId()) .getRMAppAttempt(container.getApplicationAttemptId()); Resource resource = container.getContainer().getResource(); I will reuse the Resource object, but I'm not sure if I can reuse the RMAppAttempt object. In the following code snippet, the preemption path is always updating the attempt metrics for the current app attempt. In the chargeback (resource utilization metrics) path, that's not always what we want. Containers do not always complete before a current attempt dies and a new one is started. If this happens, the chargeback path should update the metrics for the first attempt, not the second one. The call to ...getRMAppAttempt(container.getApplicationAttemptId()) will always get the attempt that started the container. Now that I think about it, it seems like that is what we want in the preemption path as well. Wangda Tan , can you please comment? If the preemption path should update the preemption info for the attempt that started the finished container, then we can reuse the RMAppAttempt object for both paths. if (ContainerExitStatus.PREEMPTED == container.finishedStatus .getExitStatus()) { Resource resource = container.getContainer().getResource(); RMAppAttempt rmAttempt = container.rmContext.getRMApps() .get(container.getApplicationAttemptId().getApplicationId()) .getCurrentAppAttempt(); rmAttempt.getRMAppAttemptMetrics().updatePreemptionInfo(resource, container); } RMAppAttempt rmAttempt = container.rmContext.getRMApps() .get(container.getApplicationAttemptId().getApplicationId()) .getRMAppAttempt(container.getApplicationAttemptId());
          Hide
          Eric Payne added a comment -
          • Can you please elaborate in what scenario we need the following extra check?
                // Only add in the running containers if this is the active attempt.
                RMAppAttempt currentAttempt = rmContext.getRMApps()
                               .get(attemptId.getApplicationId()).getCurrentAppAttempt();
                if (currentAttempt != null &&
                    currentAttempt.getAppAttemptId().compareTo(attemptId) == 0) {
                  ApplicationResourceUsageReport appResUsageReport = rmContext
                        .getScheduler().getAppResourceUsageReport(attemptId);
                  if (appResUsageReport != null) {
                    memorySeconds += appResUsageReport.getMemorySeconds();
                    vcoreSeconds += appResUsageReport.getVcoreSeconds();
                  }
                }
            

          An app could have multiple attempts if, for example, the first attempt died in the middle and the RM starts a second attempt for this app. In that situation, when RMAppAttemptMetrics#getRMAppMetrics is called for the first attempt, we only want to report the info for the completed containers, and when it is called for the second (running) attempt, we want to report for both completed and running containers. Of course, this is a little misleading when you have work-preserving restart enabled, and the running containers didn't die with the first attempt. While they are running, they are reported as the metrics for the second attempt, but when they complete, their metrics go back into the first attempt. Since these metrics are only reported at the app level, I think this should be okay. The important thing is that the running metrics only get reported once and don't get double-counted.

          • Also, currentAttempt.getAppAttemptId().compareTo(attemptId) == 0, we can use equals instead which looks more intuitive.

          Good point. I made the change.

          • getFinishedMemorySeconds and getFinishedVcoreSeconds methods are not used.
          • For setFinishedVcoreSeconds and setFinishedMemorySeconds, we can just use updateResourceUtilization

          I used updateResourceUtilization as you suggested, and removed the getters and setters.

          • RMStateStore#removeApplication: no need to calculate the memory utilization when removing the app. Saving some cost for the loop of attempts

          Good catch. I removed this calculation.

          Show
          Eric Payne added a comment - Can you please elaborate in what scenario we need the following extra check? // Only add in the running containers if this is the active attempt. RMAppAttempt currentAttempt = rmContext.getRMApps() .get(attemptId.getApplicationId()).getCurrentAppAttempt(); if (currentAttempt != null && currentAttempt.getAppAttemptId().compareTo(attemptId) == 0) { ApplicationResourceUsageReport appResUsageReport = rmContext .getScheduler().getAppResourceUsageReport(attemptId); if (appResUsageReport != null ) { memorySeconds += appResUsageReport.getMemorySeconds(); vcoreSeconds += appResUsageReport.getVcoreSeconds(); } } An app could have multiple attempts if, for example, the first attempt died in the middle and the RM starts a second attempt for this app. In that situation, when RMAppAttemptMetrics#getRMAppMetrics is called for the first attempt, we only want to report the info for the completed containers, and when it is called for the second (running) attempt, we want to report for both completed and running containers. Of course, this is a little misleading when you have work-preserving restart enabled, and the running containers didn't die with the first attempt. While they are running, they are reported as the metrics for the second attempt, but when they complete, their metrics go back into the first attempt. Since these metrics are only reported at the app level, I think this should be okay. The important thing is that the running metrics only get reported once and don't get double-counted. Also, currentAttempt.getAppAttemptId().compareTo(attemptId) == 0, we can use equals instead which looks more intuitive. Good point. I made the change. getFinishedMemorySeconds and getFinishedVcoreSeconds methods are not used. For setFinishedVcoreSeconds and setFinishedMemorySeconds, we can just use updateResourceUtilization I used updateResourceUtilization as you suggested, and removed the getters and setters. RMStateStore#removeApplication: no need to calculate the memory utilization when removing the app. Saving some cost for the loop of attempts Good catch. I removed this calculation.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12661961/YARN-415.201408150030.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 11 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4627//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4627//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661961/YARN-415.201408150030.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 11 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4627//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4627//console This message is automatically generated.
          Hide
          Jian He added a comment -

          While they are running, they are reported as the metrics for the second attempt, but when they complete, their metrics go back into the first attempt.

          Because of this, for consistency, I think we better use getCurrentAttempt to charge finished containers against current attempt also for work-presrving am restart?

              private static void updateAttemptMetrics(RMContainerImpl container) {
                // If this is a preempted container, update preemption metrics
                Resource resource = container.getContainer().getResource();
                RMAppAttempt rmAttempt = container.rmContext.getRMApps()
                    .get(container.getApplicationAttemptId().getApplicationId())
                    .getRMAppAttempt(container.getApplicationAttemptId());
          
          Show
          Jian He added a comment - While they are running, they are reported as the metrics for the second attempt, but when they complete, their metrics go back into the first attempt. Because of this, for consistency, I think we better use getCurrentAttempt to charge finished containers against current attempt also for work-presrving am restart? private static void updateAttemptMetrics(RMContainerImpl container) { // If this is a preempted container, update preemption metrics Resource resource = container.getContainer().getResource(); RMAppAttempt rmAttempt = container.rmContext.getRMApps() .get(container.getApplicationAttemptId().getApplicationId()) .getRMAppAttempt(container.getApplicationAttemptId());
          Hide
          Karthik Kambatla (Inactive) added a comment -

          Just took a look at the patch. The major concern I have is the use of RMStateStore to store app resource usage information. If we add more resources and more other statistics, storing all of them to the RM state store could be placing too much overhead on the store, particularly if it is ZKRMStateStore. Would it make more sense to store this information in the History/Timeline store?

          A more minor comment: Can we rename ResourceUtilization to AggregateAppResourceUsage for clarity?

          Heads-up (related): I am interested in tracking the the utilization (memory, vcores, ..) as a discrete time-series for analysis. Analysis after the app-run can help tune the job better. Analysis during the app-run could help with scheduling. It would be nice to make store decisions that would help both usecases.

          Show
          Karthik Kambatla (Inactive) added a comment - Just took a look at the patch. The major concern I have is the use of RMStateStore to store app resource usage information. If we add more resources and more other statistics, storing all of them to the RM state store could be placing too much overhead on the store, particularly if it is ZKRMStateStore. Would it make more sense to store this information in the History/Timeline store? A more minor comment: Can we rename ResourceUtilization to AggregateAppResourceUsage for clarity? Heads-up (related): I am interested in tracking the the utilization (memory, vcores, ..) as a discrete time-series for analysis. Analysis after the app-run can help tune the job better. Analysis during the app-run could help with scheduling. It would be nice to make store decisions that would help both usecases.
          Hide
          Jian He added a comment -

          If we add more resources and more other statistics, storing all of them to the RM state store could be placing too much overhead on the store

          Karthik, thanks for taking a look at the patch. In fact, the store operation only happens when the app finishes (the same route we store the final state of the app) . So the patch doesn't add more store invocations.

          Show
          Jian He added a comment - If we add more resources and more other statistics, storing all of them to the RM state store could be placing too much overhead on the store Karthik, thanks for taking a look at the patch. In fact, the store operation only happens when the app finishes (the same route we store the final state of the app) . So the patch doesn't add more store invocations.
          Hide
          Karthik Kambatla (Inactive) added a comment -

          In fact, the store operation only happens when the app finishes

          We will be sending more data to the store, no? Just sending these two values isn't a big deal, but when we send 15 such values for every application it could be.

          Also, what happens if the RM goes down while the app is running? If we store only on app finish, would we lose the numbers from previous app-attempts in case of non-work-preserving restart. In case of work-preserving restart, would we miss out on partial data?

          Show
          Karthik Kambatla (Inactive) added a comment - In fact, the store operation only happens when the app finishes We will be sending more data to the store, no? Just sending these two values isn't a big deal, but when we send 15 such values for every application it could be. Also, what happens if the RM goes down while the app is running? If we store only on app finish, would we lose the numbers from previous app-attempts in case of non-work-preserving restart. In case of work-preserving restart, would we miss out on partial data?
          Hide
          Eric Payne added a comment -

          Jian He and Karthik Kambatla
          Thank you both for your comments.

          Jian He wrote:

          Because of this, for consistency, I think we better use getCurrentAttempt to charge finished containers against current attempt also for work-presrving am restart?

          If I understand correctly, is the suggestion that all finished containers be charged against the current attempt? That would be tricky, since even in a normal use cases, an attempt can be in the complete state before all of its containers are finished. Also, if the first attempt dies after some of its containers are finished, then would the metrics for the finished containers need to be transferred to the new attempt? I think that, since the metrics are reported at the app level, charging the running containers to the current app until the containers finish will be seemless to the end user. One thing that could be done is to have RMAppAttemptMetrics#getRMAppMetrics get a copy of the liveContainers and report only on the ones applicable to that attempt. That seems like more overhead that may not be necessary.

          Karthik Kambatla wrote:

          Just took a look at the patch. The major concern I have is the use of RMStateStore to store app resource usage information. If we add more resources and more other statistics, storing all of them to the RM state store could be placing too much overhead on the store, particularly if it is ZKRMStateStore. Would it make more sense to store this information in the History/Timeline store?

          Can you please help me to understand in more detail how this would be accomplished?

          Show
          Eric Payne added a comment - Jian He and Karthik Kambatla Thank you both for your comments. Jian He wrote: Because of this, for consistency, I think we better use getCurrentAttempt to charge finished containers against current attempt also for work-presrving am restart? If I understand correctly, is the suggestion that all finished containers be charged against the current attempt? That would be tricky, since even in a normal use cases, an attempt can be in the complete state before all of its containers are finished. Also, if the first attempt dies after some of its containers are finished, then would the metrics for the finished containers need to be transferred to the new attempt? I think that, since the metrics are reported at the app level, charging the running containers to the current app until the containers finish will be seemless to the end user. One thing that could be done is to have RMAppAttemptMetrics#getRMAppMetrics get a copy of the liveContainers and report only on the ones applicable to that attempt. That seems like more overhead that may not be necessary. Karthik Kambatla wrote: Just took a look at the patch. The major concern I have is the use of RMStateStore to store app resource usage information. If we add more resources and more other statistics, storing all of them to the RM state store could be placing too much overhead on the store, particularly if it is ZKRMStateStore. Would it make more sense to store this information in the History/Timeline store? Can you please help me to understand in more detail how this would be accomplished?
          Hide
          Zhijie Shen added a comment -

          Would it make more sense to store this information in the History/Timeline store?

          Timeline server is a good place for storing the metrics, and hides the details of persisting the data from you. However, I'm a bit concerned that the dependency which is required for RM restarting/failover is expanded. Won't deployment of RM be more complicated?

          Show
          Zhijie Shen added a comment - Would it make more sense to store this information in the History/Timeline store? Timeline server is a good place for storing the metrics, and hides the details of persisting the data from you. However, I'm a bit concerned that the dependency which is required for RM restarting/failover is expanded. Won't deployment of RM be more complicated?
          Hide
          Jian He added a comment -

          an attempt can be in the complete state before all of its containers are finished

          CapacityScheduler#doneApplicationAttempt(FairScheduler#removeApplicationAttempt) are synchronously finishing all the live containers. so I think all containers should be guaranteed to finish before attempt finishes.

          charging the running containers to the current app until the containers finish will be seemless to the end user.

          Particularly in work-preserving AM restart, current AM is actually the one who's managing previous running containers. Running containers in scheduler are already transferred to the current AM. So running containers metrics are transferred as well. I think it'll be confusing if finished containers are still charged back against the previous dead attempt. Btw, YARN-1809 will add the attempt web page where we could show attempt-specific metrics also.

          Regarding the problem of metrics persistency. Agree that it doesn't solve the problem for running apps in general. Maybe we can have the state store changes in a separate jira and discuss more there, so that we can get this in first.

          Show
          Jian He added a comment - an attempt can be in the complete state before all of its containers are finished CapacityScheduler#doneApplicationAttempt(FairScheduler#removeApplicationAttempt) are synchronously finishing all the live containers. so I think all containers should be guaranteed to finish before attempt finishes. charging the running containers to the current app until the containers finish will be seemless to the end user. Particularly in work-preserving AM restart, current AM is actually the one who's managing previous running containers. Running containers in scheduler are already transferred to the current AM. So running containers metrics are transferred as well. I think it'll be confusing if finished containers are still charged back against the previous dead attempt. Btw, YARN-1809 will add the attempt web page where we could show attempt-specific metrics also. Regarding the problem of metrics persistency. Agree that it doesn't solve the problem for running apps in general. Maybe we can have the state store changes in a separate jira and discuss more there, so that we can get this in first.
          Hide
          Karthik Kambatla (Inactive) added a comment -

          Eric Payne - Sorry again for coming in so late. I am not completely sure myself (yet) how we can use the timeline server or if it makes sense to do that. I guess I need to first understand what we are trying to accomplish here. Could you please correct me/comment on the following items.

          1. The goal is to capture memory utilization at the app-level for chargeback. I like the goal, but would like to understand the usecases we have in mind. Is the chargeback simply to track the usage and may be financially charge the users. Or, is to influence future scheduling decisions? I agree that the RM should facilitate collecting this information, but should the collected info be available to the RM for future use? If not, do we want the RM to serve this info?
          2. Do we want to charge the app only for the resources used to do meaningful work or do we also want to include failed/preempted containers? If we don't charge the app for failed containers, who are they charged to? Are we okay with letting some resources go uncharged?
          3. How soon do we want this usage information? It might make sense to collect/expose this once the app is finished for certain kinds of applications. What is our story for long-running applications?

          As Jian suggested, I would be up for getting in those parts that we are clear about and file follow-up JIRAs for those that need more discussion.

          Show
          Karthik Kambatla (Inactive) added a comment - Eric Payne - Sorry again for coming in so late. I am not completely sure myself (yet) how we can use the timeline server or if it makes sense to do that. I guess I need to first understand what we are trying to accomplish here. Could you please correct me/comment on the following items. The goal is to capture memory utilization at the app-level for chargeback. I like the goal, but would like to understand the usecases we have in mind. Is the chargeback simply to track the usage and may be financially charge the users. Or, is to influence future scheduling decisions? I agree that the RM should facilitate collecting this information, but should the collected info be available to the RM for future use? If not, do we want the RM to serve this info? Do we want to charge the app only for the resources used to do meaningful work or do we also want to include failed/preempted containers? If we don't charge the app for failed containers, who are they charged to? Are we okay with letting some resources go uncharged? How soon do we want this usage information? It might make sense to collect/expose this once the app is finished for certain kinds of applications. What is our story for long-running applications? As Jian suggested, I would be up for getting in those parts that we are clear about and file follow-up JIRAs for those that need more discussion.
          Hide
          Eric Payne added a comment -

          Jian He, thank you for your continuing reviews and comments.

          Particularly in work-preserving AM restart, current AM is actually the one who's managing previous running containers. Running containers in scheduler are already transferred to the current AM. So running containers metrics are transferred as well. I think it'll be confusing if finished containers are still charged back against the previous dead attempt. Btw, YARN-1809 will add the attempt web page where we could show attempt-specific metrics also.

          You are correct. In the work-preserving AM restart case, the live containers are transferred to the new attempt for the remaining lifetime of the container, and then when the container completes, the original attempt gets the CONTAINER_FINISHED event. But I see your point about being consistent in the work-preserving AM restart case. So, I have attached a patch which will charge container usage to the current attempt, whether the container is running or completed.

          Regarding the problem of metrics persistency. Agree that it doesn't solve the problem for running apps in general. Maybe we can have the state store changes in a separate jira and discuss more there, so that we can get this in first.

          Yes, I would appreciate it if we could continue this discussion on a separate JIRA.

          Show
          Eric Payne added a comment - Jian He , thank you for your continuing reviews and comments. Particularly in work-preserving AM restart, current AM is actually the one who's managing previous running containers. Running containers in scheduler are already transferred to the current AM. So running containers metrics are transferred as well. I think it'll be confusing if finished containers are still charged back against the previous dead attempt. Btw, YARN-1809 will add the attempt web page where we could show attempt-specific metrics also. You are correct. In the work-preserving AM restart case, the live containers are transferred to the new attempt for the remaining lifetime of the container, and then when the container completes, the original attempt gets the CONTAINER_FINISHED event. But I see your point about being consistent in the work-preserving AM restart case. So, I have attached a patch which will charge container usage to the current attempt, whether the container is running or completed. Regarding the problem of metrics persistency. Agree that it doesn't solve the problem for running apps in general. Maybe we can have the state store changes in a separate jira and discuss more there, so that we can get this in first. Yes, I would appreciate it if we could continue this discussion on a separate JIRA.
          Hide
          Eric Payne added a comment -

          Karthik Kambatla, thank you for taking the time to review this patch.

          I would like to see if Kendall Thrapp could comment on your use case questions, but here are my initial thoughts:

          1. Is the chargeback simply to track the usage and may be financially charge the users. Or, is to influence future scheduling decisions? I agree that the RM should facilitate collecting this information, but should the collected info be available to the RM for future use? If not, do we want the RM to serve this info?

          Potential goals could be:

          1. report (and charge for) grid usage
          2. eventually limit job submission based on a users' budget

            2. Do we want to charge the app only for the resources used to do meaningful work or do we also want to include failed/preempted containers? If we don't charge the app for failed containers, who are they charged to? Are we okay with letting some resources go uncharged?

            This implementation does charge the app for failed containers. This was debated somewhat previously in this JIRA, because if the failure was due to preemption or a bug that wasn't the app's "fault," it may be unfair to charge the app for those. However, it is very unclear how one could programmatically determine whose "fault" the failure is.

          3. How soon do we want this usage information? It might make sense to collect/expose this once the app is finished for certain kinds of applications. What is our story for long-running applications?

          There is a specific use case for determine the usage at runtime. Again, I would hope that Kendall Thrapp could elaborate on this.

          Show
          Eric Payne added a comment - Karthik Kambatla , thank you for taking the time to review this patch. I would like to see if Kendall Thrapp could comment on your use case questions, but here are my initial thoughts: 1. Is the chargeback simply to track the usage and may be financially charge the users. Or, is to influence future scheduling decisions? I agree that the RM should facilitate collecting this information, but should the collected info be available to the RM for future use? If not, do we want the RM to serve this info? Potential goals could be: report (and charge for) grid usage eventually limit job submission based on a users' budget 2. Do we want to charge the app only for the resources used to do meaningful work or do we also want to include failed/preempted containers? If we don't charge the app for failed containers, who are they charged to? Are we okay with letting some resources go uncharged? This implementation does charge the app for failed containers. This was debated somewhat previously in this JIRA, because if the failure was due to preemption or a bug that wasn't the app's "fault," it may be unfair to charge the app for those. However, it is very unclear how one could programmatically determine whose "fault" the failure is. 3. How soon do we want this usage information? It might make sense to collect/expose this once the app is finished for certain kinds of applications. What is our story for long-running applications? There is a specific use case for determine the usage at runtime. Again, I would hope that Kendall Thrapp could elaborate on this.
          Hide
          Kendall Thrapp added a comment -

          1. Is the chargeback simply to track the usage and may be financially charge the users. Or, is to influence future scheduling decisions? I agree that the RM should facilitate collecting this information, but should the collected info be available to the RM for future use? If not, do we want the RM to serve this info?

          In addition to the goals Eric Payne listed, another goal is to make it easier for users to compare how code changes to a particular recurring Hadoop job affect its resource usage. Assuming input data size didn't significantly change, It'd be much more apparent after to the user after a code change if there was a resulting significant change in the resource usage for their job. Even without charging, I'm hoping that having the resource usage shown to the user, without any extra work on their part, will make more people think about their overall grid resource usage, instead of just run times.

          Show
          Kendall Thrapp added a comment - 1. Is the chargeback simply to track the usage and may be financially charge the users. Or, is to influence future scheduling decisions? I agree that the RM should facilitate collecting this information, but should the collected info be available to the RM for future use? If not, do we want the RM to serve this info? In addition to the goals Eric Payne listed, another goal is to make it easier for users to compare how code changes to a particular recurring Hadoop job affect its resource usage. Assuming input data size didn't significantly change, It'd be much more apparent after to the user after a code change if there was a resulting significant change in the resource usage for their job. Even without charging, I'm hoping that having the resource usage shown to the user, without any extra work on their part, will make more people think about their overall grid resource usage, instead of just run times.
          Hide
          Eric Payne added a comment -

          Last file attach didn't trigger Hadoopqa

          Show
          Eric Payne added a comment - Last file attach didn't trigger Hadoopqa
          Hide
          Eric Payne added a comment -

          Reattaching latest patch in order to trigger Hadoopqa.

          Jian He, thank you for all of your help and input. This patch will charge container usage to the current attempt, whether the container is running or completed. Will you please take a look at it again?

          Show
          Eric Payne added a comment - Reattaching latest patch in order to trigger Hadoopqa. Jian He , thank you for all of your help and input. This patch will charge container usage to the current attempt, whether the container is running or completed. Will you please take a look at it again?
          Hide
          Karthik Kambatla (Inactive) added a comment -

          A quick comment before we commit this.

          IIUC, we are tracking the allocation and not utilization. Actual utilization could be smaller than the amount of resources allocated (or asked for). Can we update the title and the corresponding class/field names accordingly? Also, the values are accumulated for the duration of the app. Can we add aggregate in the required class/field names?

          Show
          Karthik Kambatla (Inactive) added a comment - A quick comment before we commit this. IIUC, we are tracking the allocation and not utilization . Actual utilization could be smaller than the amount of resources allocated (or asked for). Can we update the title and the corresponding class/field names accordingly? Also, the values are accumulated for the duration of the app. Can we add aggregate in the required class/field names?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12663246/YARN-415.201408181938.txt
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4678//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12663246/YARN-415.201408181938.txt against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4678//console This message is automatically generated.
          Hide
          Eric Payne added a comment -

          Thanks, Karthik Kambatla

          IIUC, we are tracking the allocation and not utilization. Actual utilization could be smaller than the amount of resources allocated (or asked for).

          I think I understand what you mean. An app could allocate more resources than what it would end up using, but the allocation would be what is assigned to the app, not the amount used.

          Can we update the title and the corresponding class/field names accordingly? Also, the values are accumulated for the duration of the app. Can we add aggregate in the required class/field names?

          Can you please clarify what you mean by title? I will go ahead and modify the class and field names.

          Show
          Eric Payne added a comment - Thanks, Karthik Kambatla IIUC, we are tracking the allocation and not utilization. Actual utilization could be smaller than the amount of resources allocated (or asked for). I think I understand what you mean. An app could allocate more resources than what it would end up using, but the allocation would be what is assigned to the app, not the amount used. Can we update the title and the corresponding class/field names accordingly? Also, the values are accumulated for the duration of the app. Can we add aggregate in the required class/field names? Can you please clarify what you mean by title? I will go ahead and modify the class and field names.
          Hide
          Kendall Thrapp added a comment -

          Updated the JIRA title to say allocation instead of utilization.

          Show
          Kendall Thrapp added a comment - Updated the JIRA title to say allocation instead of utilization.
          Hide
          Jian He added a comment -

          Eric, thanks for your effort on this issue! we may need a test to test RMAppAttempt metrics is updated when container is finished also ? i.e. to test the code change in RMContainerFinished transition.

          Show
          Jian He added a comment - Eric, thanks for your effort on this issue! we may need a test to test RMAppAttempt metrics is updated when container is finished also ? i.e. to test the code change in RMContainerFinished transition.
          Hide
          Eric Payne added a comment -

          Karthik Kambatla, Thanks for your help in reviewing this.

          A more minor comment: Can we rename ResourceUtilization to AggregateAppResourceUsage for clarity?

          Can we update the title and the corresponding class/field names accordingly? Also, the values are accumulated for the duration of the app. Can we add aggregate in the required class/field names?

          With this patch, I changed the names of the ResourceUtilization class to AggregateAppResourceUsage, and modified appropriate fields, variable names, and method names.

          Jian He, I appreciate the time you have put in on helping me review this patch and your comments.

          we may need a test to test RMAppAttempt metrics is updated when container is finished also ? i.e. to test the code change in RMContainerFinished transition.

          Doesn't this transition get tested when the container metrics are summed and compared to the output of RMAppImpl#getRMAppMetrics? For example, as in TestContainerResourceUsage#testUsageWithMultipleContainersAndRMRestart, and in some of the other tests in TestContainerResourceUsage, each container's resources are accessed and summed. Then, when RMAppImpl#getRMAppMetrics is called, it takes the total resource allocation for each attempt, and sums those. Then the 2 are compared. Perhaps there needs to be an additional Assert to make sure these are never 0, but other than that, I think these 2 calculations are arrived at independently. Is that testing what you suggested?

          Show
          Eric Payne added a comment - Karthik Kambatla , Thanks for your help in reviewing this. A more minor comment: Can we rename ResourceUtilization to AggregateAppResourceUsage for clarity? Can we update the title and the corresponding class/field names accordingly? Also, the values are accumulated for the duration of the app. Can we add aggregate in the required class/field names? With this patch, I changed the names of the ResourceUtilization class to AggregateAppResourceUsage , and modified appropriate fields, variable names, and method names. Jian He , I appreciate the time you have put in on helping me review this patch and your comments. we may need a test to test RMAppAttempt metrics is updated when container is finished also ? i.e. to test the code change in RMContainerFinished transition. Doesn't this transition get tested when the container metrics are summed and compared to the output of RMAppImpl#getRMAppMetrics? For example, as in TestContainerResourceUsage#testUsageWithMultipleContainersAndRMRestart, and in some of the other tests in TestContainerResourceUsage, each container's resources are accessed and summed. Then, when RMAppImpl#getRMAppMetrics is called, it takes the total resource allocation for each attempt, and sums those. Then the 2 are compared. Perhaps there needs to be an additional Assert to make sure these are never 0, but other than that, I think these 2 calculations are arrived at independently. Is that testing what you suggested?
          Hide
          Jian He added a comment -

          TestContainerResourceUsage

          Ah, sorry, I missed this newly added file.

          Show
          Jian He added a comment - TestContainerResourceUsage Ah, sorry, I missed this newly added file.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12663542/YARN-415.201408212033.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 11 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          -1 release audit. The applied patch generated 3 release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4687//testReport/
          Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/4687//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4687//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12663542/YARN-415.201408212033.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 11 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. -1 release audit . The applied patch generated 3 release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4687//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/4687//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4687//console This message is automatically generated.
          Hide
          Eric Payne added a comment -

          -1 release audit. The applied patch generated 3 release audit warnings.

          Files triggering audit warnings not part of this patch: EncryptionFaultInjector.java, {{EncryptionZoneManager.java
          }}, and EncryptionZoneWithId.java

          -1 core tests. The patch failed these unit tests
          org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

          This test failure is intermittent and does not seem to be caused by this patch. Please see:
          https://builds.apache.org/job/PreCommit-YARN-Build/4711/
          https://builds.apache.org/job/PreCommit-YARN-Build/4727/

          Jian He and Karthik Kambatla, I really appreciate all of your help in reviewing this patch and making it better with your suggestions. How close are we to getting this patch submitted?

          Show
          Eric Payne added a comment - -1 release audit. The applied patch generated 3 release audit warnings. Files triggering audit warnings not part of this patch: EncryptionFaultInjector.java , {{EncryptionZoneManager.java }}, and EncryptionZoneWithId.java -1 core tests. The patch failed these unit tests org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart This test failure is intermittent and does not seem to be caused by this patch. Please see: https://builds.apache.org/job/PreCommit-YARN-Build/4711/ https://builds.apache.org/job/PreCommit-YARN-Build/4727/ Jian He and Karthik Kambatla , I really appreciate all of your help in reviewing this patch and making it better with your suggestions. How close are we to getting this patch submitted?
          Hide
          Jian He added a comment -

          Hi Eric Payne, thanks for all your effort on this. Patch looks good to me overall. Karthik Kambatla, do you want to take another look ?

          Show
          Jian He added a comment - Hi Eric Payne , thanks for all your effort on this. Patch looks good to me overall. Karthik Kambatla , do you want to take another look ?
          Hide
          Karthik Kambatla (Inactive) added a comment -

          Looking at the patch now, sorry for the delay.

          By the way, there was an offline discussion (documented on YARN-1530) about storing similar app-related metrics in the ATS. It would be nice for parties involved here to think about it and follow up on another JIRA.

          Show
          Karthik Kambatla (Inactive) added a comment - Looking at the patch now, sorry for the delay. By the way, there was an offline discussion (documented on YARN-1530 ) about storing similar app-related metrics in the ATS. It would be nice for parties involved here to think about it and follow up on another JIRA.
          Hide
          Karthik Kambatla (Inactive) added a comment -

          The patch doesn't apply anymore, due to recent changes to hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto

          Show
          Karthik Kambatla (Inactive) added a comment - The patch doesn't apply anymore, due to recent changes to hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto
          Hide
          Karthik Kambatla (Inactive) added a comment -

          Patch looks pretty clean and mostly good. Minor comments:

          1. ResourceManagerRest.apt.vm documents the memory and vcores as utilized. We should update this to allocated.
          2. Methods added to ApplicationAttemptStateData should probably be explicitly marked Public-Unstable.
          3. Annotate AggregateAppResourceUsage Private

          Otherwise, looks good to me.

          Show
          Karthik Kambatla (Inactive) added a comment - Patch looks pretty clean and mostly good. Minor comments: ResourceManagerRest.apt.vm documents the memory and vcores as utilized. We should update this to allocated. Methods added to ApplicationAttemptStateData should probably be explicitly marked Public-Unstable. Annotate AggregateAppResourceUsage Private Otherwise, looks good to me.
          Hide
          Eric Payne added a comment -

          Karthik Kambatla, thank you for taking the time to review this patch.

          The patch doesn't apply anymore,

          Upmerged patch to latest branch-2 and trunk.

          1. ResourceManagerRest.apt.vm documents the memory and vcores as utilized. We should update this to allocated.

          Changed the text.

          2. Methods added to ApplicationAttemptStateData should probably be explicitly marked Public-Unstable.

          Done

          3. Annotate AggregateAppResourceUsage Private

          Annotated

          By the way, there was an offline discussion (documented on YARN-1530) about storing similar app-related metrics in the ATS. It would be nice for parties involved here to think about it and follow up on another JIRA.

          I am looking into this now. In the meantime, can you please let me know if this current patch resolves your concerns?
          Thank you, -Eric Payne

          Show
          Eric Payne added a comment - Karthik Kambatla , thank you for taking the time to review this patch. The patch doesn't apply anymore, Upmerged patch to latest branch-2 and trunk. 1. ResourceManagerRest.apt.vm documents the memory and vcores as utilized. We should update this to allocated. Changed the text. 2. Methods added to ApplicationAttemptStateData should probably be explicitly marked Public-Unstable. Done 3. Annotate AggregateAppResourceUsage Private Annotated By the way, there was an offline discussion (documented on YARN-1530 ) about storing similar app-related metrics in the ATS. It would be nice for parties involved here to think about it and follow up on another JIRA. I am looking into this now. In the meantime, can you please let me know if this current patch resolves your concerns? Thank you, -Eric Payne
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12666493/YARN-415.201409040036.txt
          against trunk revision 8f1a668.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 12 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4823//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4823//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12666493/YARN-415.201409040036.txt against trunk revision 8f1a668. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 12 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4823//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4823//console This message is automatically generated.
          Hide
          Jian He added a comment -

          Looks good to me. Just one more question, I kind of lose context why we need this check; seems we don't need, because the returned ApplicationResourceUsageReport for non-active attempt is anyways null.

              // Only add in the running containers if this is the active attempt.
              RMAppAttempt currentAttempt = rmContext.getRMApps()
                             .get(attemptId.getApplicationId()).getCurrentAppAttempt();
              if (currentAttempt != null &&
                  currentAttempt.getAppAttemptId().equals(attemptId)) {
          
          Show
          Jian He added a comment - Looks good to me. Just one more question, I kind of lose context why we need this check; seems we don't need, because the returned ApplicationResourceUsageReport for non-active attempt is anyways null. // Only add in the running containers if this is the active attempt. RMAppAttempt currentAttempt = rmContext.getRMApps() .get(attemptId.getApplicationId()).getCurrentAppAttempt(); if (currentAttempt != null && currentAttempt.getAppAttemptId().equals(attemptId)) {
          Hide
          Eric Payne added a comment -

          Jian He, thank you very much for your time in reviewing this patch and your helpful suggestions.

          seems we don't need this check, because the returned ApplicationResourceUsageReport for non-active attempt is anyways null.

              // Only add in the running containers if this is the active attempt.
              RMAppAttempt currentAttempt = rmContext.getRMApps()
                             .get(attemptId.getApplicationId()).getCurrentAppAttempt();
              if (currentAttempt != null &&
                  currentAttempt.getAppAttemptId().equals(attemptId)) {
          

          You are correct. The above check for currentAttempt != null is not necessary.
          With this new patch, I have upmerged again (since it wasn't applying cleanly) and removed this check.

          Karthik Kambatla, I would also like to thank you for your help on this patch. Were you okay with the changes I made in response to your suggestions? It would be great if we could move this patch over the goal line soon.

          Show
          Eric Payne added a comment - Jian He , thank you very much for your time in reviewing this patch and your helpful suggestions. seems we don't need this check, because the returned ApplicationResourceUsageReport for non-active attempt is anyways null. // Only add in the running containers if this is the active attempt. RMAppAttempt currentAttempt = rmContext.getRMApps() .get(attemptId.getApplicationId()).getCurrentAppAttempt(); if (currentAttempt != null && currentAttempt.getAppAttemptId().equals(attemptId)) { You are correct. The above check for currentAttempt != null is not necessary. With this new patch, I have upmerged again (since it wasn't applying cleanly) and removed this check. Karthik Kambatla , I would also like to thank you for your help on this patch. Were you okay with the changes I made in response to your suggestions? It would be great if we could move this patch over the goal line soon.
          Hide
          Karthik Kambatla (Inactive) added a comment - - edited

          Eric - I haven't had a chance to take a look at the latest patch. I trust Jian and you to make sure the concerns are addressed, the suggestions themselves were straight-forward. Thanks for staying patient through this long-drawn JIRA.

          Show
          Karthik Kambatla (Inactive) added a comment - - edited Eric - I haven't had a chance to take a look at the latest patch. I trust Jian and you to make sure the concerns are addressed, the suggestions themselves were straight-forward. Thanks for staying patient through this long-drawn JIRA.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12667510/YARN-415.201409092204.txt
          against trunk revision 28d99db.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 12 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4863//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4863//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12667510/YARN-415.201409092204.txt against trunk revision 28d99db. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 12 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4863//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4863//console This message is automatically generated.
          Hide
          Jian He added a comment -

          Hi Eric Payne, sorry for being unclear. my main question was, is this currentAttempt.getAppAttemptId().equals(attemptId) still necessary ? since the return value of scheduler#getAppResourceUsageReport for non-active attempt is anyways empty/null.

          Show
          Jian He added a comment - Hi Eric Payne , sorry for being unclear. my main question was, is this currentAttempt.getAppAttemptId().equals(attemptId) still necessary ? since the return value of scheduler#getAppResourceUsageReport for non-active attempt is anyways empty/null.
          Hide
          Eric Payne added a comment -

          Thanks for clarifying Jian He.

          is this currentAttempt.getAppAttemptId().equals(attemptId) still necessary ? since the return value of scheduler#getAppResourceUsageReport for non-active attempt is anyways empty/null.

          I believe that the check is necessary. Here are a couple of points.

          • First, RMAppAttemptMetrics#getAggregateAppResourceUsage is called from multiple places, including RMAppImpl#getRMAppMetrics, which loops through all attempts for any given app. If the app is running and has multiple attempts, we want to charge the current attempt for both the running container stats and those that finished for that attempt. But, in this scenario, when RMAppImpl#getRMAppMetrics loops through and calls RMAppAttemptMetrics#getAggregateAppResourceUsage for the finished attempts, RMAppAttemptMetrics#getAggregateAppResourceUsage needs to know that the attempt ID is not the current attempt so that it doesn't count the running container stats again.
          • Second, from my tests and my reading of the code, I'm pretty sure that scheduler#getAppResourceUsageReport always returns the ApplicationResourceUsageReport for the current attempt, even if you give it a finished attempt. It uses the attemptId to get the app object, and then uses that to get the current attempt. I've tested this, and by taking a look at AbstractYarnScheduler#getApplicationAttempt (which is called by getAppResourceUsageReport for both CapacityScheduler and FairScheduler), we can see that it only uses the attemptId to get the app, and then calls app.getCurrentAttempt().

          I hope that helps to clarify this.
          Thank you

          Show
          Eric Payne added a comment - Thanks for clarifying Jian He . is this currentAttempt.getAppAttemptId().equals(attemptId) still necessary ? since the return value of scheduler#getAppResourceUsageReport for non-active attempt is anyways empty/null. I believe that the check is necessary. Here are a couple of points. First, RMAppAttemptMetrics#getAggregateAppResourceUsage is called from multiple places, including RMAppImpl#getRMAppMetrics , which loops through all attempts for any given app. If the app is running and has multiple attempts, we want to charge the current attempt for both the running container stats and those that finished for that attempt. But, in this scenario, when RMAppImpl#getRMAppMetrics loops through and calls RMAppAttemptMetrics#getAggregateAppResourceUsage for the finished attempts, RMAppAttemptMetrics#getAggregateAppResourceUsage needs to know that the attempt ID is not the current attempt so that it doesn't count the running container stats again. Second, from my tests and my reading of the code, I'm pretty sure that scheduler#getAppResourceUsageReport always returns the ApplicationResourceUsageReport for the current attempt, even if you give it a finished attempt. It uses the attemptId to get the app object, and then uses that to get the current attempt. I've tested this, and by taking a look at AbstractYarnScheduler#getApplicationAttempt (which is called by getAppResourceUsageReport for both CapacityScheduler and FairScheduler), we can see that it only uses the attemptId to get the app, and then calls app.getCurrentAttempt(). I hope that helps to clarify this. Thank you
          Hide
          Jian He added a comment -

          Eric, thanks for your explanation. sounds good to me.
          One nit: I found the new APIs added in ApplicationResourceUsageReport don't have code comments. Could you add that too ?
          I'd like to commit this once this this fixed. thanks for all your patience !

          Show
          Jian He added a comment - Eric, thanks for your explanation. sounds good to me. One nit: I found the new APIs added in ApplicationResourceUsageReport don't have code comments. Could you add that too ? I'd like to commit this once this this fixed. thanks for all your patience !
          Hide
          Eric Payne added a comment -

          Thanks a lot, Jian He.

          I have added comment headers for the new APIs in ApplicationResourceUsageReport.

          Show
          Eric Payne added a comment - Thanks a lot, Jian He . I have added comment headers for the new APIs in ApplicationResourceUsageReport.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12667877/YARN-415.201409102216.txt
          against trunk revision 7f80e14.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 12 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4877//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4877//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12667877/YARN-415.201409102216.txt against trunk revision 7f80e14. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 12 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4877//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4877//console This message is automatically generated.
          Hide
          Jian He added a comment -

          Committed to trunk and branch-2.

          Thanks Eric Payne and Andrey Klochkov for working on this !
          Also thanks Wangda, Jason, Karthik for previous reviews!

          Show
          Jian He added a comment - Committed to trunk and branch-2. Thanks Eric Payne and Andrey Klochkov for working on this ! Also thanks Wangda, Jason, Karthik for previous reviews!
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk #677 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/677/)
          YARN-415. Capture aggregate memory allocation at the app-level for chargeback. Contributed by Eric Payne & Andrey Klochkov (jianhe: rev 83be3ad44484bf8a24cb90de4b9c26ab59d226a8)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationResourceUsageReport.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestContainerResourceUsage.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/TestRMContainerImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/AggregateAppResourceUsage.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppInfo.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebAppFairScheduler.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/BuilderUtils.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppMetrics.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/ApplicationAttemptStateData.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/ApplicationAttemptStateDataPBImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationResourceUsageReportPBImpl.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #677 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/677/ ) YARN-415 . Capture aggregate memory allocation at the app-level for chargeback. Contributed by Eric Payne & Andrey Klochkov (jianhe: rev 83be3ad44484bf8a24cb90de4b9c26ab59d226a8) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationResourceUsageReport.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestContainerResourceUsage.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/TestRMContainerImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/AggregateAppResourceUsage.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppInfo.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebAppFairScheduler.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/BuilderUtils.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppMetrics.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/ApplicationAttemptStateData.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/ApplicationAttemptStateDataPBImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationResourceUsageReportPBImpl.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #1893 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1893/)
          YARN-415. Capture aggregate memory allocation at the app-level for chargeback. Contributed by Eric Payne & Andrey Klochkov (jianhe: rev 83be3ad44484bf8a24cb90de4b9c26ab59d226a8)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/ApplicationAttemptStateData.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppInfo.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestContainerResourceUsage.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppMetrics.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationResourceUsageReport.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationResourceUsageReportPBImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/ApplicationAttemptStateDataPBImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/BuilderUtils.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/AggregateAppResourceUsage.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebAppFairScheduler.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/TestRMContainerImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1893 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1893/ ) YARN-415 . Capture aggregate memory allocation at the app-level for chargeback. Contributed by Eric Payne & Andrey Klochkov (jianhe: rev 83be3ad44484bf8a24cb90de4b9c26ab59d226a8) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/ApplicationAttemptStateData.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppInfo.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestContainerResourceUsage.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppMetrics.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationResourceUsageReport.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationResourceUsageReportPBImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/ApplicationAttemptStateDataPBImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/BuilderUtils.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/AggregateAppResourceUsage.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebAppFairScheduler.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/TestRMContainerImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Hdfs-trunk #1868 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1868/)
          YARN-415. Capture aggregate memory allocation at the app-level for chargeback. Contributed by Eric Payne & Andrey Klochkov (jianhe: rev 83be3ad44484bf8a24cb90de4b9c26ab59d226a8)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppMetrics.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/TestRMContainerImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestContainerResourceUsage.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppInfo.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/ApplicationAttemptStateData.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebAppFairScheduler.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationResourceUsageReportPBImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/ApplicationAttemptStateDataPBImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/AggregateAppResourceUsage.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/BuilderUtils.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationResourceUsageReport.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk #1868 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1868/ ) YARN-415 . Capture aggregate memory allocation at the app-level for chargeback. Contributed by Eric Payne & Andrey Klochkov (jianhe: rev 83be3ad44484bf8a24cb90de4b9c26ab59d226a8) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppMetrics.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/TestRMContainerImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestContainerResourceUsage.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppInfo.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/ApplicationAttemptStateData.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebAppFairScheduler.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationResourceUsageReportPBImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/ApplicationAttemptStateDataPBImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/AggregateAppResourceUsage.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/BuilderUtils.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationResourceUsageReport.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
          Hide
          Andrey Klochkov added a comment -

          Eric Payne, congratulations and thanks for this tremendous amount of persistence!

          Show
          Andrey Klochkov added a comment - Eric Payne , congratulations and thanks for this tremendous amount of persistence!
          Hide
          Eric Payne added a comment -

          I would like to express my thanks to Andrey Klochkov, Jian He, Wangda Tan, Karthik Kambatla, Sandy Ryza, and Jason Lowe. It was a team effort, and I appreciate all of the great help you have given on this feature.

          Show
          Eric Payne added a comment - I would like to express my thanks to Andrey Klochkov , Jian He , Wangda Tan , Karthik Kambatla , Sandy Ryza , and Jason Lowe . It was a team effort, and I appreciate all of the great help you have given on this feature.
          Hide
          Sandy Ryza added a comment -

          Awesome to see this go in!

          Show
          Sandy Ryza added a comment - Awesome to see this go in!
          Hide
          Kendall Thrapp added a comment -

          Thanks Eric Payne, Andrey Klochkov, Jian He, Tan, Wangda, Karthik Kambatla, Sandy Ryza and Jason Lowe for all your effort on this! Looking forward to being able to use this feature.

          Show
          Kendall Thrapp added a comment - Thanks Eric Payne , Andrey Klochkov , Jian He , Tan, Wangda , Karthik Kambatla , Sandy Ryza and Jason Lowe for all your effort on this! Looking forward to being able to use this feature.

            People

            • Assignee:
              Eric Payne
              Reporter:
              Kendall Thrapp
            • Votes:
              0 Vote for this issue
              Watchers:
              32 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development