Hadoop YARN
  1. Hadoop YARN
  2. YARN-415

Capture memory utilization at the app-level for chargeback

    Details

    • Type: New Feature New Feature
    • Status: Patch Available
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.23.6
    • Fix Version/s: None
    • Component/s: resourcemanager
    • Labels:
      None
    • Target Version/s:

      Description

      For the purpose of chargeback, I'd like to be able to compute the cost of an
      application in terms of cluster resource usage. To start out, I'd like to get the memory utilization of an application. The unit should be MB-seconds or something similar and, from a chargeback perspective, the memory amount should be the memory reserved for the application, as even if the app didn't use all that memory, no one else was able to use it.

      (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
      container 2 * lifetime of container 2) + ... + (reserved ram for container n * lifetime of container n)

      It'd be nice to have this at the app level instead of the job level because:
      1. We'd still be able to get memory usage for jobs that crashed (and wouldn't appear on the job history server).
      2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).

      This new metric should be available both through the RM UI and RM Web Services REST API.

      1. YARN-415.patch
        57 kB
        Andrey Klochkov
      2. YARN-415--n10.patch
        103 kB
        Andrey Klochkov
      3. YARN-415--n2.patch
        57 kB
        Andrey Klochkov
      4. YARN-415--n3.patch
        60 kB
        Andrey Klochkov
      5. YARN-415--n4.patch
        55 kB
        Andrey Klochkov
      6. YARN-415--n5.patch
        55 kB
        Andrey Klochkov
      7. YARN-415--n6.patch
        59 kB
        Andrey Klochkov
      8. YARN-415--n7.patch
        59 kB
        Andrey Klochkov
      9. YARN-415--n8.patch
        63 kB
        Andrey Klochkov
      10. YARN-415--n9.patch
        59 kB
        Andrey Klochkov

        Issue Links

          Activity

          Kendall Thrapp created issue -
          Hide
          Arun C Murthy added a comment -

          +1 for the thought!

          Show
          Arun C Murthy added a comment - +1 for the thought!
          Hide
          Andy Rhee added a comment -

          +1 This feature is way overdue!

          Show
          Andy Rhee added a comment - +1 This feature is way overdue!
          Andrey Klochkov made changes -
          Field Original Value New Value
          Assignee Andrey Klochkov [ aklochkov ]
          Hide
          Andrey Klochkov added a comment -

          The patch exposes MB-seconds and CPU-seconds through CLI, REST API and UI.

          Show
          Andrey Klochkov added a comment - The patch exposes MB-seconds and CPU-seconds through CLI, REST API and UI.
          Andrey Klochkov made changes -
          Attachment YARN-415.patch [ 12604125 ]
          Andrey Klochkov made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12604125/YARN-415.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebApp

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1973//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1973//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1973//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604125/YARN-415.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebApp +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1973//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1973//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1973//console This message is automatically generated.
          Hide
          Andrey Klochkov added a comment -

          Updating the patch with fixes of findbugs warnings on multithreaded correctness

          Show
          Andrey Klochkov added a comment - Updating the patch with fixes of findbugs warnings on multithreaded correctness
          Andrey Klochkov made changes -
          Attachment YARN-415--n2.patch [ 12604132 ]
          Hide
          Andrey Klochkov added a comment -

          Updating the patch with fixes in tests.

          Show
          Andrey Klochkov added a comment - Updating the patch with fixes in tests.
          Andrey Klochkov made changes -
          Attachment YARN-415--n3.patch [ 12604142 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12604132/YARN-415--n2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebApp

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1975//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1975//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604132/YARN-415--n2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebApp +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1975//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1975//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12604142/YARN-415--n3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1977//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1977//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604142/YARN-415--n3.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1977//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1977//console This message is automatically generated.
          Hide
          Omkar Vinit Joshi added a comment -

          I am not sure if it affects this or not but today say if client asks for 2000 we round it off.. and this patch will report it based on round off value..Is it right?
          Also few questions about the scenarios

          • What if container is allocated by RM and before AM launches it NM reboots (invaliding the earlier container... so should the user be charged?) what is AM reports back saying
            • It doesn't want the container - I think if should still be charged as it earlier asked for one... but I remember there being an issue where RM allocated one more or one less than what user requested (obviously RM scheduler issue..)
            • It doesn't want the container may be because it is not able to launch it on NM for whatever reason.
          • for short running containers there is a gap between container actually finishes and is reported back to RM are we accounting for this? Similarly AM receiving the container and RM either allocating or processing ACQUIRED event.
          • Also today container finish time is computed on RM side. I think it should be done on NM side and communicated back to RM (implicit assumption both servers are in sync).
          Show
          Omkar Vinit Joshi added a comment - I am not sure if it affects this or not but today say if client asks for 2000 we round it off.. and this patch will report it based on round off value..Is it right? Also few questions about the scenarios What if container is allocated by RM and before AM launches it NM reboots (invaliding the earlier container... so should the user be charged?) what is AM reports back saying It doesn't want the container - I think if should still be charged as it earlier asked for one... but I remember there being an issue where RM allocated one more or one less than what user requested (obviously RM scheduler issue..) It doesn't want the container may be because it is not able to launch it on NM for whatever reason. for short running containers there is a gap between container actually finishes and is reported back to RM are we accounting for this? Similarly AM receiving the container and RM either allocating or processing ACQUIRED event. Also today container finish time is computed on RM side. I think it should be done on NM side and communicated back to RM (implicit assumption both servers are in sync).
          Omkar Vinit Joshi made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Andrey Klochkov added a comment -

          The proposed implementation uses events fired by the scheduler to track resources usage, so we start ticking as soon as a container is allocated by the scheduler and stop doing that when the container is completed and the scheduler "gets" the resources back. Hence, in case a container fails to start by some reason, we'll stop ticking as soon as RM will get this reported. As for the gap between a container actually finishes and RM gets the report, we don't manage it, i.e. the client will be charged until RM gets the report. Start time and finish time are both computed by the scheduler, i.e. it's on the RM side.

          Not sure about rounding off - can you point me to the code which does that? I think we just use what's provided in the ApplicationSubmissionContext, i.e. it shouldn't be rounded off.

          Show
          Andrey Klochkov added a comment - The proposed implementation uses events fired by the scheduler to track resources usage, so we start ticking as soon as a container is allocated by the scheduler and stop doing that when the container is completed and the scheduler "gets" the resources back. Hence, in case a container fails to start by some reason, we'll stop ticking as soon as RM will get this reported. As for the gap between a container actually finishes and RM gets the report, we don't manage it, i.e. the client will be charged until RM gets the report. Start time and finish time are both computed by the scheduler, i.e. it's on the RM side. Not sure about rounding off - can you point me to the code which does that? I think we just use what's provided in the ApplicationSubmissionContext, i.e. it shouldn't be rounded off.
          Hide
          Jason Lowe added a comment -

          I think the accounting should be based on the RM's view of the cluster state, since it is controlling which applications get the resources. If the RM allocates a container for an application, whether the application ultimately uses it or not, it is a resource that is unavailable for other applications to use. Similarly if an RM allocates a container but it takes a while for the AM to heartbeat in to receive it, takes even longer for the AM to connect to the NM to launch it, and takes even longer for the NM to report back that the container completed, all of that is still time where the container's resources were unavailable to other applications. The RM was setting aside those resources on behalf of that application, therefore the application should be accountable.

          Bugs in the RM or scheduler may cause extraneous containers to be allocated, but they are still allocated on behalf of a particular application and should be charged accordingly.

          Show
          Jason Lowe added a comment - I think the accounting should be based on the RM's view of the cluster state, since it is controlling which applications get the resources. If the RM allocates a container for an application, whether the application ultimately uses it or not, it is a resource that is unavailable for other applications to use. Similarly if an RM allocates a container but it takes a while for the AM to heartbeat in to receive it, takes even longer for the AM to connect to the NM to launch it, and takes even longer for the NM to report back that the container completed, all of that is still time where the container's resources were unavailable to other applications. The RM was setting aside those resources on behalf of that application, therefore the application should be accountable. Bugs in the RM or scheduler may cause extraneous containers to be allocated, but they are still allocated on behalf of a particular application and should be charged accordingly.
          Hide
          Jason Lowe added a comment -

          The latest patch no longer applies to trunk. Could you please refresh it? Some review comments:

          General:

          • Nit: the extra plurality of VirtualCoresSeconds sounds a bit odd, wondering if it should be VirtualCoreSeconds or VcoreSeconds in the various places it appears.

          ApplicationCLI:

          • UI wording: In the code it's vcore-seconds but the UI says CPU-seconds. I'm wondering if users are going to interpret CPU to be a hardware core, and I'm not sure a vcore will map to a hardware core in the typical case. The configuration properties refer to vcores, so we should probably use vcore-seconds here for consistency. Curious what others think about this, as I could be convinced to leave it as CPU.

          RMAppAttempt has just a spurious whitespace change

          RMAppAttemptImpl:

          • Nit: containerAllocated and containerFinished are private and always called from transitions, so acquiring the write lock is unnecessary.
          • ContainerFinishedTransition.transition does not call containerFinished when it's the AM container. We "leak" the AM container and consider it always running if an AM crashes.

          RMContainerEvent:

          • Nit: whitespace between the constructor definitions would be nice.

          TestRMAppAttemptTransitions:

          • Nit: it would be cleaner and easier to read if we add a new allocateApplicationAttemptAtTime method and have the existing allocateApplicationAttempt method simply call it with -1 rather than change all those places to pass -1.

          Speaking of "leaking" containers, is there something we can do to audit/assert that applications that have completed don't have running containers? If we lose track of a container finished event, the consumed resources are going to keep increasing indefinitely. It's a bug in the RM either way but wondering if there's some warning/sanity checking we can do to keep the metric from becoming utterly useless when it occurs. Capping it at the end of the application would at least prevent it from growing beyond the application lifetime. Then again, letting it grow continuously at least is more indicative something went terribly wrong with the accounting and therefore the metric can't be trusted. Just thinking out loud, not sure what the best solution is.

          Show
          Jason Lowe added a comment - The latest patch no longer applies to trunk. Could you please refresh it? Some review comments: General: Nit: the extra plurality of VirtualCoresSeconds sounds a bit odd, wondering if it should be VirtualCoreSeconds or VcoreSeconds in the various places it appears. ApplicationCLI: UI wording: In the code it's vcore-seconds but the UI says CPU-seconds. I'm wondering if users are going to interpret CPU to be a hardware core, and I'm not sure a vcore will map to a hardware core in the typical case. The configuration properties refer to vcores, so we should probably use vcore-seconds here for consistency. Curious what others think about this, as I could be convinced to leave it as CPU. RMAppAttempt has just a spurious whitespace change RMAppAttemptImpl: Nit: containerAllocated and containerFinished are private and always called from transitions, so acquiring the write lock is unnecessary. ContainerFinishedTransition.transition does not call containerFinished when it's the AM container. We "leak" the AM container and consider it always running if an AM crashes. RMContainerEvent: Nit: whitespace between the constructor definitions would be nice. TestRMAppAttemptTransitions: Nit: it would be cleaner and easier to read if we add a new allocateApplicationAttemptAtTime method and have the existing allocateApplicationAttempt method simply call it with -1 rather than change all those places to pass -1. Speaking of "leaking" containers, is there something we can do to audit/assert that applications that have completed don't have running containers? If we lose track of a container finished event, the consumed resources are going to keep increasing indefinitely. It's a bug in the RM either way but wondering if there's some warning/sanity checking we can do to keep the metric from becoming utterly useless when it occurs. Capping it at the end of the application would at least prevent it from growing beyond the application lifetime. Then again, letting it grow continuously at least is more indicative something went terribly wrong with the accounting and therefore the metric can't be trusted. Just thinking out loud, not sure what the best solution is.
          Hide
          Andrey Klochkov added a comment -

          Jason, thanks for the thorough review. Attaching the patch with fixes. I basically made all the fixes you're proposing except the last one about capturing the leak.

          Show
          Andrey Klochkov added a comment - Jason, thanks for the thorough review. Attaching the patch with fixes. I basically made all the fixes you're proposing except the last one about capturing the leak.
          Andrey Klochkov made changes -
          Attachment YARN-415--n4.patch [ 12606923 ]
          Jason Lowe made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Target Version/s 2.3.0 [ 12324589 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12606923/YARN-415--n4.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2140//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2140//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606923/YARN-415--n4.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 7 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2140//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2140//console This message is automatically generated.
          Hide
          Andrey Klochkov added a comment -

          Fixing the failed test.

          Show
          Andrey Klochkov added a comment - Fixing the failed test.
          Andrey Klochkov made changes -
          Attachment YARN-415--n5.patch [ 12607275 ]
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12607275/YARN-415--n5.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2142//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2142//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12607275/YARN-415--n5.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 7 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2142//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2142//console This message is automatically generated.
          Hide
          Jason Lowe added a comment -

          Thanks for updating the patch , Andrey. The changes look good, and I think we're close. However I spent a bit more time thinking about the leak case, and there's at least one more place where we can leak the bookkeeping for containers in RMAppAttemptImpl.

          If an AM crashes while tasks are running then we'll get a CONTAINER_FINISHED event, clean up the AM container and move the app attempt to the FAILED state. However in the FAILED state we ignore any container events, and subsequent container events for tasks that are killed off after the AM attempt dies will be ignored. That means we leak the bookkeeping for some containers and the resource utilization of the app will increase indefinitely.

          Therefore I think with this change we can no longer ignore CONTAINER_ALLOCATED or CONTAINER_FINISHED events as is currently done in a number of states.

          There is the option of forcing the container list to be empty when we enter terminal states. This may help cover some leak scenarios but isn't totally correct from a bookkeeping standpoint if those containers are technically still active. Not an issue in the short-term because when an AM attempt dies all the active containers of the app are killed immediately afterwards. However I could see that potentially changing in the future for long-lived YARN apps (otherwise AM becomes SPoF for the whole app).

          Show
          Jason Lowe added a comment - Thanks for updating the patch , Andrey. The changes look good, and I think we're close. However I spent a bit more time thinking about the leak case, and there's at least one more place where we can leak the bookkeeping for containers in RMAppAttemptImpl. If an AM crashes while tasks are running then we'll get a CONTAINER_FINISHED event, clean up the AM container and move the app attempt to the FAILED state. However in the FAILED state we ignore any container events, and subsequent container events for tasks that are killed off after the AM attempt dies will be ignored. That means we leak the bookkeeping for some containers and the resource utilization of the app will increase indefinitely. Therefore I think with this change we can no longer ignore CONTAINER_ALLOCATED or CONTAINER_FINISHED events as is currently done in a number of states. There is the option of forcing the container list to be empty when we enter terminal states. This may help cover some leak scenarios but isn't totally correct from a bookkeeping standpoint if those containers are technically still active. Not an issue in the short-term because when an AM attempt dies all the active containers of the app are killed immediately afterwards. However I could see that potentially changing in the future for long-lived YARN apps (otherwise AM becomes SPoF for the whole app).
          Jason Lowe made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Andrey Klochkov added a comment -

          With the 1st option it's not clear how to implement a protection from leaks. There's no event which can be used to check for leaks in that case. At the same time currently Yarn behavior does not support containers surviving after AM is finished, so the 2nd option is acceptable. This may need to be changed when there'll be support for long-lived apps and attempts which stay alive after AM is stopped.

          Attaching a patch which implements option #2 and adds a test for it.

          Show
          Andrey Klochkov added a comment - With the 1st option it's not clear how to implement a protection from leaks. There's no event which can be used to check for leaks in that case. At the same time currently Yarn behavior does not support containers surviving after AM is finished, so the 2nd option is acceptable. This may need to be changed when there'll be support for long-lived apps and attempts which stay alive after AM is stopped. Attaching a patch which implements option #2 and adds a test for it.
          Andrey Klochkov made changes -
          Attachment YARN-415--n6.patch [ 12607895 ]
          Andrey Klochkov made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12607895/YARN-415--n6.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2161//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2161//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12607895/YARN-415--n6.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 7 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2161//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2161//console This message is automatically generated.
          Hide
          Jason Lowe added a comment -

          Thanks for the update Andrey. This change should resolve my concerns about the running container leaks, but there are some points that need to be addressed with respect to logging in cleanupRunningContainers:

          • If an application with many containers in-flight simply unregisters and exits expecting the RM to clean up the mess or the application simply crashes, we're going to log a lot of messages for all those containers. Currently the RM kills all current containers of an application already, so we're talking about being incorrect on the order of a few milliseconds for a sane RM. I think this should be an INFO rather than a WARN. Also we probably want to log a single message per application, stating how many containers were affected rather than specific ones, since we don't currently expose container-specific metrics anyway.
          • There's a "new memSec" log message that appears to be a debugging artifact that was left in the patch
          Show
          Jason Lowe added a comment - Thanks for the update Andrey. This change should resolve my concerns about the running container leaks, but there are some points that need to be addressed with respect to logging in cleanupRunningContainers: If an application with many containers in-flight simply unregisters and exits expecting the RM to clean up the mess or the application simply crashes, we're going to log a lot of messages for all those containers. Currently the RM kills all current containers of an application already, so we're talking about being incorrect on the order of a few milliseconds for a sane RM. I think this should be an INFO rather than a WARN. Also we probably want to log a single message per application, stating how many containers were affected rather than specific ones, since we don't currently expose container-specific metrics anyway. There's a "new memSec" log message that appears to be a debugging artifact that was left in the patch
          Hide
          Andrey Klochkov added a comment -

          Thanks Jason. Attaching a fixed patch.

          Show
          Andrey Klochkov added a comment - Thanks Jason. Attaching a fixed patch.
          Andrey Klochkov made changes -
          Attachment YARN-415--n7.patch [ 12609031 ]
          Hide
          Kendall Thrapp added a comment -

          Thanks Andrey for implementing this. I'm looking forward to being able to use it. Just a reminder to also update the REST API docs (http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_API).

          Show
          Kendall Thrapp added a comment - Thanks Andrey for implementing this. I'm looking forward to being able to use it. Just a reminder to also update the REST API docs ( http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_API ).
          Hide
          Andrey Klochkov added a comment -

          Kendall, sure, I may update the Wiki as soon as the patch is committed. BTW how can I get write access to the Wiki?

          Show
          Andrey Klochkov added a comment - Kendall, sure, I may update the Wiki as soon as the patch is committed. BTW how can I get write access to the Wiki?
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12609031/YARN-415--n7.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2208//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2208//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12609031/YARN-415--n7.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 7 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2208//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2208//console This message is automatically generated.
          Hide
          Jason Lowe added a comment -

          The REST API doc change is a good catch, Kendall. The source for that doc is at hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm and should be updated as part of this patch.

          Show
          Jason Lowe added a comment - The REST API doc change is a good catch, Kendall. The source for that doc is at hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm and should be updated as part of this patch.
          Hide
          Andrey Klochkov added a comment -

          Adding changes in REST API docs to the patch.

          Show
          Andrey Klochkov added a comment - Adding changes in REST API docs to the patch.
          Andrey Klochkov made changes -
          Attachment YARN-415--n8.patch [ 12609068 ]
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12609068/YARN-415--n8.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2213//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2213//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12609068/YARN-415--n8.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 7 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2213//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2213//console This message is automatically generated.
          Hide
          Arun C Murthy added a comment -

          I'm sorry to come in late, I just did a cursory look.

          One question: Do we really need to track ResourceUsage for each Container? Can't we just add it up when a container finishes? Maybe I'm missing something? But, I'd like to not have a lot of per-container state if possible. Thanks.

          Show
          Arun C Murthy added a comment - I'm sorry to come in late, I just did a cursory look. One question: Do we really need to track ResourceUsage for each Container? Can't we just add it up when a container finishes? Maybe I'm missing something? But, I'd like to not have a lot of per-container state if possible. Thanks.
          Hide
          Andrey Klochkov added a comment -

          Arun, the idea is to have the stats being updated in real time while the app is running. Is there a way to get a list of running containers assigned to the app, with their start times, without tracking it explicitly?

          Show
          Andrey Klochkov added a comment - Arun, the idea is to have the stats being updated in real time while the app is running. Is there a way to get a list of running containers assigned to the app, with their start times, without tracking it explicitly?
          Hide
          Jason Lowe added a comment -

          It's not just a real-time issue, it's also a correctness issue. When a container finishes we need to know the time it was allocated. So regardless of whether we want to compute the usage in real-time, the start time of a container and its resource sizes need to be tracked somewhere in the RM.

          ResourceUsage is just a Resource plus a start time, and the Resource should be referencing the same object already referenced by the Container inside RMContainerImpl. To implement this feature we need to track the containers that are allocated/running (already being done by RMContainerImpl) and what time they started (which we are not currently doing and why ResourceUsage was created).

          There is the issue of the HashMap to map a container ID to its resource and start time. We could remove the need for this if we stored the container start time in RMContainerImpl and had a safe way to lookup containers for an application attempt. We can get the containers for an application via scheduler.getSchedulerAppInfo, and RMAppAttemptImpl already does this when generating an app report. However since RMAppAttemptImpl and the scheduler are running in separate threads, I could see the scheduler already removing the container before RMAppAttemptImpl received the container completion event and tried to lookup the container for usage calculation. Given the race, along with the fact that getSchedulerAppInfo is not necessarily cheap, it seems reasonable to have RMAppAttemptImpl track what it needs for running containers directly.

          Show
          Jason Lowe added a comment - It's not just a real-time issue, it's also a correctness issue. When a container finishes we need to know the time it was allocated. So regardless of whether we want to compute the usage in real-time, the start time of a container and its resource sizes need to be tracked somewhere in the RM. ResourceUsage is just a Resource plus a start time, and the Resource should be referencing the same object already referenced by the Container inside RMContainerImpl. To implement this feature we need to track the containers that are allocated/running (already being done by RMContainerImpl) and what time they started (which we are not currently doing and why ResourceUsage was created). There is the issue of the HashMap to map a container ID to its resource and start time. We could remove the need for this if we stored the container start time in RMContainerImpl and had a safe way to lookup containers for an application attempt. We can get the containers for an application via scheduler.getSchedulerAppInfo, and RMAppAttemptImpl already does this when generating an app report. However since RMAppAttemptImpl and the scheduler are running in separate threads, I could see the scheduler already removing the container before RMAppAttemptImpl received the container completion event and tried to lookup the container for usage calculation. Given the race, along with the fact that getSchedulerAppInfo is not necessarily cheap, it seems reasonable to have RMAppAttemptImpl track what it needs for running containers directly.
          Hide
          Sandy Ryza added a comment -

          However since RMAppAttemptImpl and the scheduler are running in separate threads, I could see the scheduler already removing the container before RMAppAttemptImpl received the container completion event and tried to lookup the container for usage calculation.

          The most accurate timestamp for a container start/end with respect to utilization/chargeback is when the scheduler allocates/releases it, as that's the moment that the resource becomes inaccessible/accessible. Is there a reason we need to set these with an asynchronous event after the scheduler has completed the operation? Why not move the calculation entirely into the scheduler (providing code that could be shared across all schedulers).

          Show
          Sandy Ryza added a comment - However since RMAppAttemptImpl and the scheduler are running in separate threads, I could see the scheduler already removing the container before RMAppAttemptImpl received the container completion event and tried to lookup the container for usage calculation. The most accurate timestamp for a container start/end with respect to utilization/chargeback is when the scheduler allocates/releases it, as that's the moment that the resource becomes inaccessible/accessible. Is there a reason we need to set these with an asynchronous event after the scheduler has completed the operation? Why not move the calculation entirely into the scheduler (providing code that could be shared across all schedulers).
          Hide
          Andrey Klochkov added a comment -

          IMO it makes sense to move this tracking into the scheduler, and in particular SchedulerApplication looks like a good place to have this logic. I'm wondering why SchedulerApplication has everything abstract, while it's descendants have a lot of same fields and code. Why isn't the common code placed into the SchedulerApplication itself? If I'm not missing anything here, I'd move all that code and this resources usage tracking just into SchedulerApplication. Please comment.

          Show
          Andrey Klochkov added a comment - IMO it makes sense to move this tracking into the scheduler, and in particular SchedulerApplication looks like a good place to have this logic. I'm wondering why SchedulerApplication has everything abstract, while it's descendants have a lot of same fields and code. Why isn't the common code placed into the SchedulerApplication itself? If I'm not missing anything here, I'd move all that code and this resources usage tracking just into SchedulerApplication. Please comment.
          Hide
          Sandy Ryza added a comment -

          I don't think there's any strong reason not to move a lot of the common code from FSSchedulerApp and FiCaSchedulerApp into SchedulerApplication. Filed YARN-1335 for this.

          Show
          Sandy Ryza added a comment - I don't think there's any strong reason not to move a lot of the common code from FSSchedulerApp and FiCaSchedulerApp into SchedulerApplication. Filed YARN-1335 for this.
          Andrey Klochkov made changes -
          Link This issue depends upon YARN-1335 [ YARN-1335 ]
          Hide
          Andrey Klochkov added a comment -

          Adding YARN-1335 as a dependency.

          Show
          Andrey Klochkov added a comment - Adding YARN-1335 as a dependency.
          Hide
          Andrey Klochkov added a comment -

          On a second thought, scheduler does not seem like a good place for this stats. It doesn't keep info on finished apps, so the tracking data will be gone as soon as an app is done, if the logic is placed in the scheduler. At the same time app attempts are kept in the RMContext until evicted, so usage stats can be pulled out of there by an external system that have persistence/reporting/etc.

          Show
          Andrey Klochkov added a comment - On a second thought, scheduler does not seem like a good place for this stats. It doesn't keep info on finished apps, so the tracking data will be gone as soon as an app is done, if the logic is placed in the scheduler. At the same time app attempts are kept in the RMContext until evicted, so usage stats can be pulled out of there by an external system that have persistence/reporting/etc.
          Hide
          Sandy Ryza added a comment -

          The scheduler can write to a tracking object that the RM passes it

          Show
          Sandy Ryza added a comment - The scheduler can write to a tracking object that the RM passes it
          Hide
          Jason Lowe added a comment -

          The most accurate timestamp for a container start/end with respect to utilization/chargeback is when the scheduler allocates/releases it, as that's the moment that the resource becomes inaccessible/accessible. Is there a reason we need to set these with an asynchronous event after the scheduler has completed the operation?

          The asynchronous event is generated in FiCaSchedulerApp.allocate which is already called by the various schedulers right when the container is allocated. Note that the event is timestamped when it is generated not when it is received, so the resource usage tracking has an accurate time of allocation when performing usage calculations. Also this is not generating any new events rather just using the events which already exist and are required to be sent.

          Why not move the calculation entirely into the scheduler (providing code that could be shared across all schedulers).

          I'm not seeing how the proposed change is going to make the calculation significantly more accurate unless the RM is lagging seriously far behind processing events, and the calculation will be accurate once the app completes as all events will be processed by that time.

          I'm not totally against moving the calculation into FiCaSchedulerApp or something similar, but it does create new issues to solve that we didn't have before like locking between the scheduler and RMAppAttemptImpl to share this resource usage report and making sure it doesn't disappear when the scheduler's done with the app.

          Show
          Jason Lowe added a comment - The most accurate timestamp for a container start/end with respect to utilization/chargeback is when the scheduler allocates/releases it, as that's the moment that the resource becomes inaccessible/accessible. Is there a reason we need to set these with an asynchronous event after the scheduler has completed the operation? The asynchronous event is generated in FiCaSchedulerApp.allocate which is already called by the various schedulers right when the container is allocated. Note that the event is timestamped when it is generated not when it is received, so the resource usage tracking has an accurate time of allocation when performing usage calculations. Also this is not generating any new events rather just using the events which already exist and are required to be sent. Why not move the calculation entirely into the scheduler (providing code that could be shared across all schedulers). I'm not seeing how the proposed change is going to make the calculation significantly more accurate unless the RM is lagging seriously far behind processing events, and the calculation will be accurate once the app completes as all events will be processed by that time. I'm not totally against moving the calculation into FiCaSchedulerApp or something similar, but it does create new issues to solve that we didn't have before like locking between the scheduler and RMAppAttemptImpl to share this resource usage report and making sure it doesn't disappear when the scheduler's done with the app.
          Hide
          Sandy Ryza added a comment -

          The asynchronous event is generated in FiCaSchedulerApp.allocate which is already called by the various schedulers right when the container is allocated.

          Didn't realize that. In that case, I don't have a strong opinion on whether it should be in the scheduler or not. Either way, it seems to me that there should only be one place in the RM that maps container IDs in an app attempt to containers (with their usages and start times), and we should use this map for calculating the usage of running containers. Currently this is in the scheduler, but we could move it into RMAppAttempt if that makes more sense and have the scheduler reference it there?

          Show
          Sandy Ryza added a comment - The asynchronous event is generated in FiCaSchedulerApp.allocate which is already called by the various schedulers right when the container is allocated. Didn't realize that. In that case, I don't have a strong opinion on whether it should be in the scheduler or not. Either way, it seems to me that there should only be one place in the RM that maps container IDs in an app attempt to containers (with their usages and start times), and we should use this map for calculating the usage of running containers. Currently this is in the scheduler, but we could move it into RMAppAttempt if that makes more sense and have the scheduler reference it there?
          Hide
          Andrey Klochkov added a comment -

          Updated the patch moving tracking logic into Scheduler:

          • AppSchedulingInfo tracks resources usage. Existing methods are reused and overall it seems more like the right place to have this logic in.
          • When app is finished and Scheduler evicts it from it's cache, it sends a new type of event (RMAppAttemptAppFinishedEvent) to the attempt, attaching usage stats to the event.
          • RMAppAttemptImpl test is modified accordingly
          • a new test is added to verify resources tracking in AppSchedulingInfo
          Show
          Andrey Klochkov added a comment - Updated the patch moving tracking logic into Scheduler: AppSchedulingInfo tracks resources usage. Existing methods are reused and overall it seems more like the right place to have this logic in. When app is finished and Scheduler evicts it from it's cache, it sends a new type of event (RMAppAttemptAppFinishedEvent) to the attempt, attaching usage stats to the event. RMAppAttemptImpl test is modified accordingly a new test is added to verify resources tracking in AppSchedulingInfo
          Andrey Klochkov made changes -
          Attachment YARN-415--n9.patch [ 12610399 ]
          Hide
          Andrey Klochkov added a comment -

          This scheme has a downside that the stats would be incorrect between 2 events: 1) Scheduler evicting the app from the cache and sending an event and 2) RMAppAttemptImpl receiving the event and updating it's internal stats. The only idea I have is to add an additional roundtrip extending this schema to:
          1. When app is finished, Scheduler sends RMAppAttemptAppFinishedEvent instance and does not evict the app from the cache yet
          2. RMAppAttemptImpl receives the event, updates it's internal fields "finalMemorySeconds" and "finalVcoreSeconds" and sends a new type event to the Scheduler allowing it to evict the app.
          3. Scheduler gets the event and evicts the app.

          Thoughts?

          Show
          Andrey Klochkov added a comment - This scheme has a downside that the stats would be incorrect between 2 events: 1) Scheduler evicting the app from the cache and sending an event and 2) RMAppAttemptImpl receiving the event and updating it's internal stats. The only idea I have is to add an additional roundtrip extending this schema to: 1. When app is finished, Scheduler sends RMAppAttemptAppFinishedEvent instance and does not evict the app from the cache yet 2. RMAppAttemptImpl receives the event, updates it's internal fields "finalMemorySeconds" and "finalVcoreSeconds" and sends a new type event to the Scheduler allowing it to evict the app. 3. Scheduler gets the event and evicts the app. Thoughts?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12610399/YARN-415--n9.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2292//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2292//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2292//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610399/YARN-415--n9.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2292//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2292//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2292//console This message is automatically generated.
          Hide
          Jason Lowe added a comment -

          I haven't fully digested the latest patch yet, but here are some initial impressions:

          I believe Sandy's intention was to remove the need for a separate runningContainers map, but that map that still exists in the patch and has simply been moved from RMAppAttemptImpl to SchedulerAppInfo. This necessitated a new event and added a new race condition, so I'm not sure this is a better overall approach.

          To remove the need for a separate runningContainers map we need to reuse the place in the code where the schedulers are already tracking the active containers for an application, and that's in SchedulerApplication.liveContainers. We could extend RMContainer to add the ability to obtain an allocation start time, and now we can compute the resource consumption for the active containers in SchedulerApplication and roll them up into a usage total when the containers complete and are removed from liveContainers. Then at least we're eliminating an extra map to track active containers.

          As for the race condition, how about requiring schedulers to retain app attempts in their cache until signaled by RMAppAttemptImpl that it can be flushed? RMAppAttemptImpl already knows (eventually) when an application completes, and it can grab the latest app report with the rollup of resource usage from the scheduler, cache that usage locally into a total, then tell the scheduler via a new scheduler event that it can release the app attempt from its cache.

          Show
          Jason Lowe added a comment - I haven't fully digested the latest patch yet, but here are some initial impressions: I believe Sandy's intention was to remove the need for a separate runningContainers map, but that map that still exists in the patch and has simply been moved from RMAppAttemptImpl to SchedulerAppInfo. This necessitated a new event and added a new race condition, so I'm not sure this is a better overall approach. To remove the need for a separate runningContainers map we need to reuse the place in the code where the schedulers are already tracking the active containers for an application, and that's in SchedulerApplication.liveContainers. We could extend RMContainer to add the ability to obtain an allocation start time, and now we can compute the resource consumption for the active containers in SchedulerApplication and roll them up into a usage total when the containers complete and are removed from liveContainers. Then at least we're eliminating an extra map to track active containers. As for the race condition, how about requiring schedulers to retain app attempts in their cache until signaled by RMAppAttemptImpl that it can be flushed? RMAppAttemptImpl already knows (eventually) when an application completes, and it can grab the latest app report with the rollup of resource usage from the scheduler, cache that usage locally into a total, then tell the scheduler via a new scheduler event that it can release the app attempt from its cache.
          Hide
          Andrey Klochkov added a comment -

          Updating the patch:

          • got rid of the "runningContainers" map
          • refactored SchedulerApplication by pulling some of the methods from descendants (allocate, containerCompleted, unreserve), making most of the fields private and modifying "containerCompleted" to do resource usage tracking
          • made scheduler not evicting app immediately after the attempt is finished, but waiting for signal from RMAppAttemptImpl
          Show
          Andrey Klochkov added a comment - Updating the patch: got rid of the "runningContainers" map refactored SchedulerApplication by pulling some of the methods from descendants (allocate, containerCompleted, unreserve), making most of the fields private and modifying "containerCompleted" to do resource usage tracking made scheduler not evicting app immediately after the attempt is finished, but waiting for signal from RMAppAttemptImpl
          Andrey Klochkov made changes -
          Attachment YARN-415--n10.patch [ 12610446 ]
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12610446/YARN-415--n10.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 10 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2294//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2294//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12610446/YARN-415--n10.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 10 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2294//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2294//console This message is automatically generated.
          Hide
          Jason Lowe added a comment -

          Thanks, Andrey, for updating the patch to the new design. Further comments:

          • The patch no longer applies after YARN-891 and needs to be refreshed.
          • I think RMAppAttemptEventType.APP_FINISHED should be something like RMAppAttemptEventType.APP_SCHEDULER_STATS and include the final resource usage stats. This eliminates an extra getSchedulerApp call that the app attempt currently has to make, and that call is not cheap. It also helps reduce some confusion about what APP_FINISHED really means and why the scheduler is sending it.
          Show
          Jason Lowe added a comment - Thanks, Andrey, for updating the patch to the new design. Further comments: The patch no longer applies after YARN-891 and needs to be refreshed. I think RMAppAttemptEventType.APP_FINISHED should be something like RMAppAttemptEventType.APP_SCHEDULER_STATS and include the final resource usage stats. This eliminates an extra getSchedulerApp call that the app attempt currently has to make, and that call is not cheap. It also helps reduce some confusion about what APP_FINISHED really means and why the scheduler is sending it.
          Hide
          Junping Du added a comment -

          This work is very important to make YARN to serve as a public cloud service (in PaaS layer). Andrey, thanks for working on this!
          Any new progress after Jason's last comments?

          Show
          Junping Du added a comment - This work is very important to make YARN to serve as a public cloud service (in PaaS layer). Andrey, thanks for working on this! Any new progress after Jason's last comments?
          Hide
          Kendall Thrapp added a comment -

          Thanks Andrey for all your work on this! I'm looking forward to being able to use this. Any updates?

          Show
          Kendall Thrapp added a comment - Thanks Andrey for all your work on this! I'm looking forward to being able to use this. Any updates?
          Hide
          Andrey Klochkov added a comment -

          I'm updating the patch to make it applicable to current trunk. Going slowly but hope I'll finish this week.

          Show
          Andrey Klochkov added a comment - I'm updating the patch to make it applicable to current trunk. Going slowly but hope I'll finish this week.

            People

            • Assignee:
              Andrey Klochkov
              Reporter:
              Kendall Thrapp
            • Votes:
              0 Vote for this issue
              Watchers:
              19 Start watching this issue

              Dates

              • Created:
                Updated:

                Development