Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0.3-alpha
    • Component/s: None
    • Labels:
      None
    1. mapreduce-4334-design-doc.txt
      7 kB
      Andrew Ferguson
    2. mapreduce-4334-design-doc-v2.txt
      7 kB
      Andrew Ferguson
    3. MAPREDUCE-4334-executor-v1.patch
      30 kB
      Andrew Ferguson
    4. MAPREDUCE-4334-executor-v2.patch
      48 kB
      Andrew Ferguson
    5. MAPREDUCE-4334-executor-v3.patch
      50 kB
      Andrew Ferguson
    6. MAPREDUCE-4334-executor-v4.patch
      50 kB
      Andrew Ferguson
    7. MAPREDUCE-4334-pre1.patch
      17 kB
      Andrew Ferguson
    8. MAPREDUCE-4334-pre2.patch
      21 kB
      Andrew Ferguson
    9. MAPREDUCE-4334-pre2-with_cpu.patch
      22 kB
      Andrew Ferguson
    10. MAPREDUCE-4334-pre3.patch
      21 kB
      Andrew Ferguson
    11. MAPREDUCE-4334-pre3-with_cpu.patch
      22 kB
      Andrew Ferguson
    12. MAPREDUCE-4334-v1.patch
      22 kB
      Andrew Ferguson
    13. MAPREDUCE-4334-v2.patch
      23 kB
      Andrew Ferguson
    14. YARN-3-lce_only-v1.patch
      40 kB
      Andrew Ferguson

      Issue Links

        Activity

        Arun C Murthy made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Andrew Ferguson added a comment -

        Arun C Murthy thanks for the merge Arun!

        Show
        Andrew Ferguson added a comment - Arun C Murthy thanks for the merge Arun!
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1337 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1337/)
        YARN-355. Fixes a bug where RM app submission could jam under load. Contributed by Daryn Sharp. (Revision 1443131)
        YARN-357. App submission should not be synchronized (daryn) (Revision 1443016)
        YARN-3. Merged to branch-2. (Revision 1443011)

        Result = SUCCESS
        sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1443131
        Files :

        • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/YarnClientImpl.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/security/RMDelegationTokenRenewer.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/resources
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/RMDelegationTokenIdentifier.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/META-INF/services/org.apache.hadoop.security.token.TokenRenewer
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMTokens.java

        daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1443016
        Files :

        • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java

        acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1443011
        Files :

        • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1337 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1337/ ) YARN-355 . Fixes a bug where RM app submission could jam under load. Contributed by Daryn Sharp. (Revision 1443131) YARN-357 . App submission should not be synchronized (daryn) (Revision 1443016) YARN-3 . Merged to branch-2. (Revision 1443011) Result = SUCCESS sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1443131 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/YarnClientImpl.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/security/RMDelegationTokenRenewer.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/resources /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/RMDelegationTokenIdentifier.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/META-INF/services/org.apache.hadoop.security.token.TokenRenewer /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMTokens.java daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1443016 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1443011 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #1309 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1309/)
        YARN-355. Fixes a bug where RM app submission could jam under load. Contributed by Daryn Sharp. (Revision 1443131)
        YARN-357. App submission should not be synchronized (daryn) (Revision 1443016)
        YARN-3. Merged to branch-2. (Revision 1443011)

        Result = FAILURE
        sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1443131
        Files :

        • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/YarnClientImpl.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/security/RMDelegationTokenRenewer.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/resources
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/RMDelegationTokenIdentifier.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/META-INF/services/org.apache.hadoop.security.token.TokenRenewer
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMTokens.java

        daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1443016
        Files :

        • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java

        acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1443011
        Files :

        • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1309 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1309/ ) YARN-355 . Fixes a bug where RM app submission could jam under load. Contributed by Daryn Sharp. (Revision 1443131) YARN-357 . App submission should not be synchronized (daryn) (Revision 1443016) YARN-3 . Merged to branch-2. (Revision 1443011) Result = FAILURE sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1443131 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/YarnClientImpl.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/security/RMDelegationTokenRenewer.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/resources /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/RMDelegationTokenIdentifier.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/META-INF/services/org.apache.hadoop.security.token.TokenRenewer /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMTokens.java daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1443016 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1443011 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Yarn-trunk #120 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/120/)
        YARN-355. Fixes a bug where RM app submission could jam under load. Contributed by Daryn Sharp. (Revision 1443131)
        YARN-357. App submission should not be synchronized (daryn) (Revision 1443016)
        YARN-3. Merged to branch-2. (Revision 1443011)

        Result = FAILURE
        sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1443131
        Files :

        • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/YarnClientImpl.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/security/RMDelegationTokenRenewer.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/resources
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/RMDelegationTokenIdentifier.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/META-INF/services/org.apache.hadoop.security.token.TokenRenewer
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMTokens.java

        daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1443016
        Files :

        • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java

        acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1443011
        Files :

        • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
        Show
        Hudson added a comment - Integrated in Hadoop-Yarn-trunk #120 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/120/ ) YARN-355 . Fixes a bug where RM app submission could jam under load. Contributed by Daryn Sharp. (Revision 1443131) YARN-357 . App submission should not be synchronized (daryn) (Revision 1443016) YARN-3 . Merged to branch-2. (Revision 1443011) Result = FAILURE sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1443131 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/YarnClientImpl.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/security/RMDelegationTokenRenewer.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/resources /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/RMDelegationTokenIdentifier.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/META-INF/services/org.apache.hadoop.security.token.TokenRenewer /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMTokens.java daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1443016 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1443011 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
        Hide
        Hudson added a comment -

        Integrated in Hadoop-trunk-Commit #3329 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3329/)
        YARN-3. Merged to branch-2. (Revision 1443011)

        Result = SUCCESS
        acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1443011
        Files :

        • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
        Show
        Hudson added a comment - Integrated in Hadoop-trunk-Commit #3329 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3329/ ) YARN-3 . Merged to branch-2. (Revision 1443011) Result = SUCCESS acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1443011 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
        Hide
        Arun C Murthy added a comment -

        Merged to branch-2.

        Show
        Arun C Murthy added a comment - Merged to branch-2.
        Arun C Murthy made changes -
        Fix Version/s 2.0.3-alpha [ 12323272 ]
        Fix Version/s 3.0.0 [ 12323268 ]
        Hide
        Arun C Murthy added a comment -

        I didn't realize that this never made it to branch-2, I'll merge it in - this goes well with YARN-2.

        Show
        Arun C Murthy added a comment - I didn't realize that this never made it to branch-2, I'll merge it in - this goes well with YARN-2 .
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1290 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1290/)
        YARN-3. Add support for CPU isolation/monitoring of containers. (adferguson via tucu) (Revision 1423706)

        Result = SUCCESS
        tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1423706
        Files :

        • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/DefaultLCEResourcesHandler.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/LCEResourcesHandler.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/configuration.c
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/configuration.h
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.h
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1290 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1290/ ) YARN-3 . Add support for CPU isolation/monitoring of containers. (adferguson via tucu) (Revision 1423706) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1423706 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/DefaultLCEResourcesHandler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/LCEResourcesHandler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/configuration.c /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/configuration.h /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.h /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #1259 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1259/)
        YARN-3. Add support for CPU isolation/monitoring of containers. (adferguson via tucu) (Revision 1423706)

        Result = FAILURE
        tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1423706
        Files :

        • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/DefaultLCEResourcesHandler.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/LCEResourcesHandler.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/configuration.c
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/configuration.h
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.h
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1259 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1259/ ) YARN-3 . Add support for CPU isolation/monitoring of containers. (adferguson via tucu) (Revision 1423706) Result = FAILURE tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1423706 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/DefaultLCEResourcesHandler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/LCEResourcesHandler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/configuration.c /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/configuration.h /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.h /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Yarn-trunk #70 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/70/)
        YARN-3. Add support for CPU isolation/monitoring of containers. (adferguson via tucu) (Revision 1423706)

        Result = SUCCESS
        tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1423706
        Files :

        • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/DefaultLCEResourcesHandler.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/LCEResourcesHandler.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/configuration.c
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/configuration.h
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.h
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java
        Show
        Hudson added a comment - Integrated in Hadoop-Yarn-trunk #70 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/70/ ) YARN-3 . Add support for CPU isolation/monitoring of containers. (adferguson via tucu) (Revision 1423706) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1423706 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/DefaultLCEResourcesHandler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/LCEResourcesHandler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/configuration.c /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/configuration.h /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.h /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-trunk-Commit #3138 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3138/)
        YARN-3. Add support for CPU isolation/monitoring of containers. (adferguson via tucu) (Revision 1423706)

        Result = SUCCESS
        tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1423706
        Files :

        • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/DefaultLCEResourcesHandler.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/LCEResourcesHandler.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/configuration.c
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/configuration.h
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.h
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java
        Show
        Hudson added a comment - Integrated in Hadoop-trunk-Commit #3138 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3138/ ) YARN-3 . Add support for CPU isolation/monitoring of containers. (adferguson via tucu) (Revision 1423706) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1423706 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/DefaultLCEResourcesHandler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/LCEResourcesHandler.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/configuration.c /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/configuration.h /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.h /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java
        Alejandro Abdelnur made changes -
        Status In Progress [ 3 ] Resolved [ 5 ]
        Fix Version/s 3.0.0 [ 12323268 ]
        Resolution Fixed [ 1 ]
        Hide
        Alejandro Abdelnur added a comment -

        Thanks Andrew. I've just committed this to trunk.

        Show
        Alejandro Abdelnur added a comment - Thanks Andrew. I've just committed this to trunk.
        Hide
        Alejandro Abdelnur added a comment -

        test-patch came back green with the lastest patch, https://issues.apache.org/jira/browse/YARN-147?focusedCommentId=13535394&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13535394

        +1. Thanks Andrew for following up on this one and thanks Vinod for reviewing this as well. Final patch is in YARN-147.

        I'll commit it as YARN-3.

        Show
        Alejandro Abdelnur added a comment - test-patch came back green with the lastest patch, https://issues.apache.org/jira/browse/YARN-147?focusedCommentId=13535394&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13535394 +1. Thanks Andrew for following up on this one and thanks Vinod for reviewing this as well. Final patch is in YARN-147 . I'll commit it as YARN-3 .
        Hide
        Andrew Ferguson added a comment -

        Vinod Kumar Vavilapalli you bet! I will fix these today.

        thanks,
        Andrew

        Show
        Andrew Ferguson added a comment - Vinod Kumar Vavilapalli you bet! I will fix these today. thanks, Andrew
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Andrew, can you please look at the FindBugs and the test case issue at YARN-147. Let's try and get this in tomorrow.

        Also can you please find the pending issues. I can file any that I know tomorrow. Tx.

        Show
        Vinod Kumar Vavilapalli added a comment - Andrew, can you please look at the FindBugs and the test case issue at YARN-147 . Let's try and get this in tomorrow. Also can you please find the pending issues. I can file any that I know tomorrow. Tx.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Did a quick review (incremental review, trusting my previous self ). Looks good, let's track the pending items separately. Triggering Jenkins on YARN-147 and will close the tickets once blessed.

        Show
        Vinod Kumar Vavilapalli added a comment - Did a quick review (incremental review, trusting my previous self ). Looks good, let's track the pending items separately. Triggering Jenkins on YARN-147 and will close the tickets once blessed.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Will review by EOD today. Thanks for the tip.

        Show
        Vinod Kumar Vavilapalli added a comment - Will review by EOD today. Thanks for the tip.
        Hide
        Alejandro Abdelnur added a comment -

        Hey Vinod, have you had a chance to look at the latest patch. It seems all 'must' comments have been addressed. Are you OK with committing and continue working on follow up JIRAs?

        Show
        Alejandro Abdelnur added a comment - Hey Vinod, have you had a chance to look at the latest patch. It seems all 'must' comments have been addressed. Are you OK with committing and continue working on follow up JIRAs?
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Thanks for the update Andrew. Will look at this patch sometime later this week. Hopefully, it is closer to commit.

        Show
        Vinod Kumar Vavilapalli added a comment - Thanks for the update Andrew. Will look at this patch sometime later this week. Hopefully, it is closer to commit.
        Hide
        Andrew Ferguson added a comment -

        hi everyone, sorry for the delay on this patch – the east coast hurricane & other events set me behind schedule.

        I have attached a new version of this work to YARN-147 (v8); it is based on the latest version of trunk. as always, you can see my github tree for exact changes: https://github.com/adferguson/hadoop-common/

        this patch has been tested (and confirmed to work) as follows:

        • default executor, no cgroups
        • Linux executor, no cgroups
        • Linux executor, with cgroups
        • Linux executor, mount cgroups automatically
        • Linux executor, cgroups already mounted & asked to mount
        • error condition: cgroups already mounted & cannot write to cgroup
        • error condition: asked to mount cgroups, but cannot mount

        both error conditions result in the NodeManager halting, as we have discussed above.

        Bikas Saha, to answer your first question: mountCgroups is a function in LinuxContainerExecutor because that class is simply a Java wrapper for the functions provided by the LCE.

        Bikas Saha, to answer your second question: if we use cgroups to limit CPU and there is only one container running on the machine, the current design will allow the container to access all of the CPU resources until other tasks start running (a work-conserving design). this design is using the CPU weights feature of cgroups, rather than the cpu bandwidth feature (or the entirely separate cpusets controller) to limit the bandwidth (a non-work-conserving design).

        thank you,
        Andrew

        Show
        Andrew Ferguson added a comment - hi everyone, sorry for the delay on this patch – the east coast hurricane & other events set me behind schedule. I have attached a new version of this work to YARN-147 (v8); it is based on the latest version of trunk. as always, you can see my github tree for exact changes: https://github.com/adferguson/hadoop-common/ this patch has been tested (and confirmed to work) as follows: default executor, no cgroups Linux executor, no cgroups Linux executor, with cgroups Linux executor, mount cgroups automatically Linux executor, cgroups already mounted & asked to mount error condition: cgroups already mounted & cannot write to cgroup error condition: asked to mount cgroups, but cannot mount both error conditions result in the NodeManager halting, as we have discussed above. Bikas Saha , to answer your first question: mountCgroups is a function in LinuxContainerExecutor because that class is simply a Java wrapper for the functions provided by the LCE. Bikas Saha , to answer your second question: if we use cgroups to limit CPU and there is only one container running on the machine, the current design will allow the container to access all of the CPU resources until other tasks start running (a work-conserving design). this design is using the CPU weights feature of cgroups, rather than the cpu bandwidth feature (or the entirely separate cpusets controller) to limit the bandwidth (a non-work-conserving design). thank you, Andrew
        Hide
        Bikas Saha added a comment -

        Does anything need to be done in a context similar to YARN-72 and YARN-73 when using cgroups? ie around shutdown and restart of NM - both clean and unclean shutdown?

        Show
        Bikas Saha added a comment - Does anything need to be done in a context similar to YARN-72 and YARN-73 when using cgroups? ie around shutdown and restart of NM - both clean and unclean shutdown?
        Hide
        Bikas Saha added a comment -

        Why is mountCGroups a method in LinuxContainerExecutor code? Shouldn't CgroupsLCEResourcesHandler completely abstract out everything about cgroups?

        +  public boolean mountCgroups(List<String> cgroupKVs, String hierarchy) {
        

        It would be really great to add more tests. This is a fairly non-trivial piece of functionality. So having more functional tests that verify cgroups, limits, killing etc are working as expected would help a ton in guarding against bugs and regressions.

        What will happen if we use cgroups to limit CPU but there is only 1 container running on a machine. Will the container be able to access all the CPU until other tasks start running?

        Show
        Bikas Saha added a comment - Why is mountCGroups a method in LinuxContainerExecutor code? Shouldn't CgroupsLCEResourcesHandler completely abstract out everything about cgroups? + public boolean mountCgroups(List< String > cgroupKVs, String hierarchy) { It would be really great to add more tests. This is a fairly non-trivial piece of functionality. So having more functional tests that verify cgroups, limits, killing etc are working as expected would help a ton in guarding against bugs and regressions. What will happen if we use cgroups to limit CPU but there is only 1 container running on a machine. Will the container be able to access all the CPU until other tasks start running?
        Hide
        Vinod Kumar Vavilapalli added a comment -

        I think the default should be false since it's not clear what a sensible default mount path is.

        +1

        the sleep is necessary as sometimes the LCE reports that the container has exited, even though the AM process has not terminated. hence, because the process is still running, we can't remove the cgroup yet; therefore, the code sleeps briefly.

        That doesn't sound right. LCE launches a shell which in turn launches the JVM, so I'd think none of them should return earlier than the JVM. We need more information, but we can post-pone to a follow up ticket.

        since the AM doesn't always have the ID of 1, what do you suggest I do to determine whether the container has the AM or not? if there isn't a good rule, the code can just always sleep before removing the cgroup.

        We will need to augment the containerID with AM-OR-NOT information, which is a bigger change. We can defer this to another ticket.

        great catch! thanks! I've made this non-fatal. now, the NM will attempt to re-mount the cgroup, will print a message that it can't do that because it's mounted, and everything will work, because it will simply work as in the case where the cluster admin has already mounted the cgroups.

        Sure.

        for the hierarchy-prefix, this needs to be configurable since, in the scenario where the admin creates the cgroups in advance, the NM doesn't have privileges to create its own hierarchy.

        Oh, yeah. You are right, we should document this in the description saying that if they are mounted in advance, the hierarchy-prefix should reflect what is already mounted or else NM may fail.

        for the mount-path, this is a good question. Linux distributions mount the cgroup controllers in various locations, so I thought it was better to keep it configurable, since I figured it would be confusing if the OS had already mounted some of the cgroup congrollers on /cgroup/ or /sys/fs/cgroup/, and then the NM started mounting additional controllers in /path/nm/owns/cgroup/.

        Makes sense now. Let's add some version of this too to the config description.

        is it better to launch a container even if we can't enforce the limits? or is it better to prevent the container from launching? happy to make the necessary quick change.

        I think it is a fatal error if the admin wanted to use cgroups and for some reason, NM cannot enforce it.

        if I'm reading this correctly, yes, that is what I first wanted to do when I started this patch (see discussions at the top of this YARN-3 thread, the early patches for MAPREDUCE-4334, and the current YARN-4). however, it seems we have decided to go another way.

        Just read the whole discussion on this ticket. I think that we went the shorter cut, which I believe is not the right long term solution. Partly can be attributed to the original patch doing so many things. Let's discuss this in a separate JIRA.

        Show
        Vinod Kumar Vavilapalli added a comment - I think the default should be false since it's not clear what a sensible default mount path is. +1 the sleep is necessary as sometimes the LCE reports that the container has exited, even though the AM process has not terminated. hence, because the process is still running, we can't remove the cgroup yet; therefore, the code sleeps briefly. That doesn't sound right. LCE launches a shell which in turn launches the JVM, so I'd think none of them should return earlier than the JVM. We need more information, but we can post-pone to a follow up ticket. since the AM doesn't always have the ID of 1, what do you suggest I do to determine whether the container has the AM or not? if there isn't a good rule, the code can just always sleep before removing the cgroup. We will need to augment the containerID with AM-OR-NOT information, which is a bigger change. We can defer this to another ticket. great catch! thanks! I've made this non-fatal. now, the NM will attempt to re-mount the cgroup, will print a message that it can't do that because it's mounted, and everything will work, because it will simply work as in the case where the cluster admin has already mounted the cgroups. Sure. for the hierarchy-prefix, this needs to be configurable since, in the scenario where the admin creates the cgroups in advance, the NM doesn't have privileges to create its own hierarchy. Oh, yeah. You are right, we should document this in the description saying that if they are mounted in advance, the hierarchy-prefix should reflect what is already mounted or else NM may fail. for the mount-path, this is a good question. Linux distributions mount the cgroup controllers in various locations, so I thought it was better to keep it configurable, since I figured it would be confusing if the OS had already mounted some of the cgroup congrollers on /cgroup/ or /sys/fs/cgroup/, and then the NM started mounting additional controllers in /path/nm/owns/cgroup/. Makes sense now. Let's add some version of this too to the config description. is it better to launch a container even if we can't enforce the limits? or is it better to prevent the container from launching? happy to make the necessary quick change. I think it is a fatal error if the admin wanted to use cgroups and for some reason, NM cannot enforce it. if I'm reading this correctly, yes, that is what I first wanted to do when I started this patch (see discussions at the top of this YARN-3 thread, the early patches for MAPREDUCE-4334 , and the current YARN-4 ). however, it seems we have decided to go another way. Just read the whole discussion on this ticket. I think that we went the shorter cut, which I believe is not the right long term solution. Partly can be attributed to the original patch doing so many things. Let's discuss this in a separate JIRA.
        Hide
        Alejandro Abdelnur added a comment -

        CgroupsLCEResourcesHandler is swallowing exceptions ....

        The user expectation is that if Hadoop is configured to use cgroups, then Hadoop is using cgroups.

        IMO, if we configure Hadoop to use cgroups, and for some reason it cannot, it should be treated as fatal.

        Make ResourcesHandler top level....

        I'd defer this to a follow up patch.

        Show
        Alejandro Abdelnur added a comment - CgroupsLCEResourcesHandler is swallowing exceptions .... The user expectation is that if Hadoop is configured to use cgroups, then Hadoop is using cgroups. IMO, if we configure Hadoop to use cgroups, and for some reason it cannot, it should be treated as fatal. Make ResourcesHandler top level.... I'd defer this to a follow up patch.
        Hide
        Andrew Ferguson added a comment -

        thanks for the review Vinod Kumar Vavilapalli. I'll post an updated patch on YARN-147. there's a lot of food for thought here (design questions), so here are some comments:

        yarn.nodemanager.linux-container-executor.cgroups.mount has different defaults in code and in yarn-default.xml

        yeah – personally, I think the default should be false since it's not clear what a sensible default mount path is. I had changed the line in the code in response to Tucu's comment [1], but I'm changing it back to false since true doesn't seem sensible to me. if anyone in the community has a sensible default mount path, then we can surely change the default to true in both the code and yarn-default.xml :-/

        Can you explain this? Is this sleep necessary. Depending on its importance, we'll need to fix the following Id check, AMs don't always have ID equaling one.

        the sleep is necessary as sometimes the LCE reports that the container has exited, even though the AM process has not terminated. hence, because the process is still running, we can't remove the cgroup yet; therefore, the code sleeps briefly.

        since the AM doesn't always have the ID of 1, what do you suggest I do to determine whether the container has the AM or not? if there isn't a good rule, the code can just always sleep before removing the cgroup.

        container-executor.c: If a mount-point is already mounted, mount gives a EBUSY error, mount_cgroup() will need to be fixed to support remounts (for e.g. on NM restarts). We could unmount cgroup fs on shutdown but that isn't always guaranteed.

        great catch! thanks! I've made this non-fatal. now, the NM will attempt to re-mount the cgroup, will print a message that it can't do that because it's mounted, and everything will work, because it will simply work as in the case where the cluster admin has already mounted the cgroups.

        Not sure of the benefit of configurable yarn.nodemanager.linux-container-executor.cgroups.mount-path. Couldn't NM just always mount to a path that it creates and owns? Similar comment for the hierarchy-prefix.

        for the hierarchy-prefix, this needs to be configurable since, in the scenario where the admin creates the cgroups in advance, the NM doesn't have privileges to create its own hierarchy.

        for the mount-path, this is a good question. Linux distributions mount the cgroup controllers in various locations, so I thought it was better to keep it configurable, since I figured it would be confusing if the OS had already mounted some of the cgroup congrollers on /cgroup/ or /sys/fs/cgroup/, and then the NM started mounting additional controllers in /path/nm/owns/cgroup/.

        CgroupsLCEResourcesHandler is swallowing exceptions and errors in multiple places - updateCgroup() and createCgroup(). In the later, if cgroups are enabled, and we can't create the file, it is a critical error?

        I'm fine either way. what would people prefer to see? is it better to launch a container even if we can't enforce the limits? or is it better to prevent the container from launching? happy to make the necessary quick change.

        Make ResourcesHandler top level. I'd like to merge the ContainersMonitor functionality with this so as to monitor/enforce memory limits also. ContainersMinotor is top-level, we should make ResourcesHandler also top-level so that other platforms don't need to create this type-hierarchy all over again when they wish to implement some or all of this functionality.

        if I'm reading this correctly, yes, that is what I first wanted to do when I started this patch (see discussions at the top of this YARN-3 thread, the early patches for MAPREDUCE-4334, and the current YARN-4). however, it seems we have decided to go another way.

        thank you,
        Andrew

        [1] https://issues.apache.org/jira/browse/YARN-147?focusedCommentId=13470926&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13470926

        Show
        Andrew Ferguson added a comment - thanks for the review Vinod Kumar Vavilapalli . I'll post an updated patch on YARN-147 . there's a lot of food for thought here (design questions), so here are some comments: yarn.nodemanager.linux-container-executor.cgroups.mount has different defaults in code and in yarn-default.xml yeah – personally, I think the default should be false since it's not clear what a sensible default mount path is. I had changed the line in the code in response to Tucu's comment [1] , but I'm changing it back to false since true doesn't seem sensible to me. if anyone in the community has a sensible default mount path, then we can surely change the default to true in both the code and yarn-default.xml :-/ Can you explain this? Is this sleep necessary. Depending on its importance, we'll need to fix the following Id check, AMs don't always have ID equaling one. the sleep is necessary as sometimes the LCE reports that the container has exited, even though the AM process has not terminated. hence, because the process is still running, we can't remove the cgroup yet; therefore, the code sleeps briefly. since the AM doesn't always have the ID of 1, what do you suggest I do to determine whether the container has the AM or not? if there isn't a good rule, the code can just always sleep before removing the cgroup. container-executor.c: If a mount-point is already mounted, mount gives a EBUSY error, mount_cgroup() will need to be fixed to support remounts (for e.g. on NM restarts). We could unmount cgroup fs on shutdown but that isn't always guaranteed. great catch! thanks! I've made this non-fatal. now, the NM will attempt to re-mount the cgroup, will print a message that it can't do that because it's mounted, and everything will work, because it will simply work as in the case where the cluster admin has already mounted the cgroups. Not sure of the benefit of configurable yarn.nodemanager.linux-container-executor.cgroups.mount-path. Couldn't NM just always mount to a path that it creates and owns? Similar comment for the hierarchy-prefix. for the hierarchy-prefix, this needs to be configurable since, in the scenario where the admin creates the cgroups in advance, the NM doesn't have privileges to create its own hierarchy. for the mount-path, this is a good question. Linux distributions mount the cgroup controllers in various locations, so I thought it was better to keep it configurable, since I figured it would be confusing if the OS had already mounted some of the cgroup congrollers on /cgroup/ or /sys/fs/cgroup/, and then the NM started mounting additional controllers in /path/nm/owns/cgroup/. CgroupsLCEResourcesHandler is swallowing exceptions and errors in multiple places - updateCgroup() and createCgroup(). In the later, if cgroups are enabled, and we can't create the file, it is a critical error? I'm fine either way. what would people prefer to see? is it better to launch a container even if we can't enforce the limits? or is it better to prevent the container from launching? happy to make the necessary quick change. Make ResourcesHandler top level. I'd like to merge the ContainersMonitor functionality with this so as to monitor/enforce memory limits also. ContainersMinotor is top-level, we should make ResourcesHandler also top-level so that other platforms don't need to create this type-hierarchy all over again when they wish to implement some or all of this functionality. if I'm reading this correctly, yes, that is what I first wanted to do when I started this patch (see discussions at the top of this YARN-3 thread, the early patches for MAPREDUCE-4334 , and the current YARN-4 ). however, it seems we have decided to go another way. thank you, Andrew [1] https://issues.apache.org/jira/browse/YARN-147?focusedCommentId=13470926&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13470926
        Hide
        Andrew Ferguson added a comment -

        (replying to comments on YARN-147 here instead as per Arun C Murthy's request)

        thanks for catching that bug Siddharth Seth! I've updated my git repo [1], and will post a new patch after addressing the review from Vinod Kone. I successfully tested it quite a bit with and without cgroups back in the summer, but it seems the patch has shifted enough since the testing that I should do it again.

        [1] https://github.com/adferguson/hadoop-common/commits/adf-yarn-147

        Show
        Andrew Ferguson added a comment - (replying to comments on YARN-147 here instead as per Arun C Murthy 's request) thanks for catching that bug Siddharth Seth ! I've updated my git repo [1] , and will post a new patch after addressing the review from Vinod Kone . I successfully tested it quite a bit with and without cgroups back in the summer, but it seems the patch has shifted enough since the testing that I should do it again. [1] https://github.com/adferguson/hadoop-common/commits/adf-yarn-147
        Hide
        Alejandro Abdelnur added a comment -

        Heads up, YARN-147 (the JIRA created to be able to upload patches for YARN-3) has been +1ed. Thx

        Show
        Alejandro Abdelnur added a comment - Heads up, YARN-147 (the JIRA created to be able to upload patches for YARN-3 ) has been +1ed. Thx
        Hide
        Alejandro Abdelnur added a comment -

        The WF for this JIRA is broken, created YARN-147 and posted patch there to run test-patch and do the review.

        Show
        Alejandro Abdelnur added a comment - The WF for this JIRA is broken, created YARN-147 and posted patch there to run test-patch and do the review.
        Alejandro Abdelnur made changes -
        Link This issue is duplicated by YARN-147 [ YARN-147 ]
        Eli Collins made changes -
        Status Patch Available [ 10002 ] In Progress [ 3 ]
        Andrew Ferguson made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Gavin made changes -
        Workflow jira [ 12719391 ] no-reopen-closed, patch-avail [ 12722169 ]
        Hide
        Thomas Graves added a comment -

        the YARN project is still waiting to get the proper state transitions setup so the "submit patch" option isn't available. Your comment should be enough for now.

        Show
        Thomas Graves added a comment - the YARN project is still waiting to get the proper state transitions setup so the "submit patch" option isn't available. Your comment should be enough for now.
        Hide
        Andrew Ferguson added a comment -

        hi,

        I would like to mark this JIRA as "patch available" for the patch I uploaded on August 9th. however, it doesn't seem to be available in my list of "More Actions". perhaps there is some other step I need to take?

        thanks!
        Andrew

        Show
        Andrew Ferguson added a comment - hi, I would like to mark this JIRA as "patch available" for the patch I uploaded on August 9th. however, it doesn't seem to be available in my list of "More Actions". perhaps there is some other step I need to take? thanks! Andrew
        Andrew Ferguson made changes -
        Attachment YARN-3-lce_only-v1.patch [ 12539962 ]
        Hide
        Andrew Ferguson added a comment -

        This patch augments the LinuxContainerExecutor with an LCEResourcesHandler, which can be used to enforce resource limits using either cgroups (in this patch) or sched_setaffinity/taskset (future patch). A DefaultLCEResourcesHandler is also provided which does not enforce any new resource limits.

        The LCEResourcesHandler interface (and concrete classes) are introduced to keep the LinuxContainerExecutor java class simple, and to separate the cgroups and future sched_setaffinity logic.

        The resources handler code is split across the introduced Java classes, and the existing container-executor native binary. This is done to minimize the amount of code added to the native binary, to provide easy logging via Java mechanisms, and because a singleton is needed to track CPU assignments when using sched_setaffinity/taskset.

        The handler operates synchronously with the execution of the container.

        The changes to the native code are:
        1) A resources option has been added to the LaunchContainer command. This option is used to convey a list of cgroups into which the container should be placed before the user command is launched, and, in the future, will be used to alternatively covey a list of CPUs which the process should be pinned to, if using sched_setaffinity instead of cgroups.

        2) A --mount-cgroups command has been added to the native code. This command will mount cgroups controllers and create hierarchies for the NodeManager to manage. This feature is optional (see below), and exposed to the Java code via a new method in LinuxContainerExecutor.java.

        The following configuration options are introduced:

        yarn.nodemanager.linux-container-executor.resources-handler.class – The class which should assist the LCE in handling resources.

        yarn.nodemanager.linux-container-executor.cgroups.hierarchy – The cgroups hierarchy under which to place YARN proccesses (cannot contain commas). Only used when the LCE resources handler is set to the CgroupsLCEResourcesHandler.

        yarn.nodemanager.linux-container-executor.cgroups.mount – Whether the LCE should attempt to mount cgroups if not found. Only used when the LCE resources handler is set to the CgroupsLCEResourcesHandler.

        yarn.nodemanager.linux-container-executor.cgroups.mount-path – Where the LCE should attempt to mount cgroups if not found. Common locations include /sys/fs/cgroup and /cgroup. The path must exist before the NodeManager is launched. Only used when the LCE resources handler is set to the CgroupsLCEResourcesHandler.

        Show
        Andrew Ferguson added a comment - This patch augments the LinuxContainerExecutor with an LCEResourcesHandler, which can be used to enforce resource limits using either cgroups (in this patch) or sched_setaffinity/taskset (future patch). A DefaultLCEResourcesHandler is also provided which does not enforce any new resource limits. The LCEResourcesHandler interface (and concrete classes) are introduced to keep the LinuxContainerExecutor java class simple, and to separate the cgroups and future sched_setaffinity logic. The resources handler code is split across the introduced Java classes, and the existing container-executor native binary. This is done to minimize the amount of code added to the native binary, to provide easy logging via Java mechanisms, and because a singleton is needed to track CPU assignments when using sched_setaffinity/taskset. The handler operates synchronously with the execution of the container. The changes to the native code are: 1) A resources option has been added to the LaunchContainer command. This option is used to convey a list of cgroups into which the container should be placed before the user command is launched, and, in the future, will be used to alternatively covey a list of CPUs which the process should be pinned to, if using sched_setaffinity instead of cgroups. 2) A --mount-cgroups command has been added to the native code. This command will mount cgroups controllers and create hierarchies for the NodeManager to manage. This feature is optional (see below), and exposed to the Java code via a new method in LinuxContainerExecutor.java. The following configuration options are introduced: yarn.nodemanager.linux-container-executor.resources-handler.class – The class which should assist the LCE in handling resources. yarn.nodemanager.linux-container-executor.cgroups.hierarchy – The cgroups hierarchy under which to place YARN proccesses (cannot contain commas). Only used when the LCE resources handler is set to the CgroupsLCEResourcesHandler. yarn.nodemanager.linux-container-executor.cgroups.mount – Whether the LCE should attempt to mount cgroups if not found. Only used when the LCE resources handler is set to the CgroupsLCEResourcesHandler. yarn.nodemanager.linux-container-executor.cgroups.mount-path – Where the LCE should attempt to mount cgroups if not found. Common locations include /sys/fs/cgroup and /cgroup. The path must exist before the NodeManager is launched. Only used when the LCE resources handler is set to the CgroupsLCEResourcesHandler.
        Arun C Murthy made changes -
        Assignee Andrew Ferguson [ adferguson ]
        Arun C Murthy made changes -
        Workflow no-reopen-closed, patch-avail [ 12672913 ] jira [ 12672913 ]
        Project Hadoop Map/Reduce [ 12310941 ] Hadoop YARN [ 12313722 ]
        Key MAPREDUCE-4334 YARN-3
        Description Once we get in MAPREDUCE-4327, it will be important to actually enforce limits on CPU consumption of containers.

        Several options spring to mind:
        # taskset (RHEL5+)
        # cgroups (RHEL6+)
        Assignee Andrew Ferguson [ adferguson ]
        Hide
        Arun C Murthy added a comment -

        Maybe I haven't been able to communicate this clear enough, please let me try again:

        I'd strongly go for a model where platform-specific features (e.g. cgroups, setuid etc.) are supported via the native code and build system (autotool chain) so that we can, from the end-user perspective, automatically deal with them via a single controlling configuration knob i.e. yarn.nodemanager.container-executor in this case.

        The alternative, which is various Java interfaces are much worse since now you have to configure yarn.nodemanager.container-executor, the resource-enforcer etc. This can also have configuration errors such as TasksetEnforcer in RHEL6 or CgroupsEnforcer in RHEL5 etc.

        The native code is, simply, far simpler option which puts the onus on us and takes the burden away from the end-user or admin.

        Thoughts?

        Show
        Arun C Murthy added a comment - Maybe I haven't been able to communicate this clear enough, please let me try again: I'd strongly go for a model where platform-specific features (e.g. cgroups, setuid etc.) are supported via the native code and build system (autotool chain) so that we can, from the end-user perspective, automatically deal with them via a single controlling configuration knob i.e. yarn.nodemanager.container-executor in this case. The alternative, which is various Java interfaces are much worse since now you have to configure yarn.nodemanager.container-executor, the resource-enforcer etc. This can also have configuration errors such as TasksetEnforcer in RHEL6 or CgroupsEnforcer in RHEL5 etc. The native code is, simply, far simpler option which puts the onus on us and takes the burden away from the end-user or admin. Thoughts?
        Hide
        Arun C Murthy added a comment -

        Also, I'll add that since cgroups is Linux-specific anyway, I don't see how it will be used on other platforms i.e. Windows.

        Show
        Arun C Murthy added a comment - Also, I'll add that since cgroups is Linux-specific anyway, I don't see how it will be used on other platforms i.e. Windows.
        Hide
        Arun C Murthy added a comment -

        Alejandro, LCE accomplishes 2 things:

        1. It serves as a 'root' tool with the setuid bit
        2. It serves as the home for Linux-specific container maintenance code

        Now, for other platforms you have to add other ContainerExecutors anyway for e.g. branch-1-win has a WindowsTaskController which will be ported over to trunk as WindowsContainerExecutor.

        As, a result, I very much like to continue keeping the Linux-specific bits in LCE. Furthermore, with native code it is much, much easier to have platform-specific low-level code i.e. we can use autotools chain to resolve RHEL5 v/s RHEL6 etc. Doing that via Java plugins is very, very painful and leads to proliferation of interfaces and configurations. The native code is something we can deal very easily via Bigtop and other packaging projects.

        Thoughts?

        Show
        Arun C Murthy added a comment - Alejandro, LCE accomplishes 2 things: It serves as a 'root' tool with the setuid bit It serves as the home for Linux-specific container maintenance code Now, for other platforms you have to add other ContainerExecutors anyway for e.g. branch-1-win has a WindowsTaskController which will be ported over to trunk as WindowsContainerExecutor. As, a result, I very much like to continue keeping the Linux-specific bits in LCE. Furthermore, with native code it is much, much easier to have platform-specific low-level code i.e. we can use autotools chain to resolve RHEL5 v/s RHEL6 etc. Doing that via Java plugins is very, very painful and leads to proliferation of interfaces and configurations. The native code is something we can deal very easily via Bigtop and other packaging projects. Thoughts?
        Hide
        Alejandro Abdelnur added a comment -

        Arun, if somebody is willing to install cgrulesengd/cgexec in the nodes then there is no need for super-user privileges;p plus, any CE could be used (unmodified) with a ResourceEnforcer injecting cgexec to the launcher invocation. This has also the benefit that if we add more resource dimensions (last bullet above), CE implementations would not need to change, only the ResourceEnforcer. Which means, no code duplication, the cgroup configuration logic lives once, in the ResourceEnforcer, as opposed to every CE that wants to support cgroups. Finally, I like the fact that with the ResourceEnforcer we are doing a clean separation of responsibilities between the ResourceEnforcer (configures) and the ContainerExecutor (executes), IMO this separation will simplify making improvements in each one of them without risk of mixing these 2 responsibilities.

        Show
        Alejandro Abdelnur added a comment - Arun, if somebody is willing to install cgrulesengd/cgexec in the nodes then there is no need for super-user privileges;p plus, any CE could be used (unmodified) with a ResourceEnforcer injecting cgexec to the launcher invocation. This has also the benefit that if we add more resource dimensions (last bullet above), CE implementations would not need to change, only the ResourceEnforcer. Which means, no code duplication, the cgroup configuration logic lives once, in the ResourceEnforcer, as opposed to every CE that wants to support cgroups. Finally, I like the fact that with the ResourceEnforcer we are doing a clean separation of responsibilities between the ResourceEnforcer (configures) and the ContainerExecutor (executes), IMO this separation will simplify making improvements in each one of them without risk of mixing these 2 responsibilities.
        Hide
        Arun C Murthy added a comment -

        Alejandro - I'm thinking that since only LCE can use cgroups (due to necessary super-user privs etc.), it's simpler to do minimal changes to LCE to create/encapsulate into cgroups. Thoughts?

        Show
        Arun C Murthy added a comment - Alejandro - I'm thinking that since only LCE can use cgroups (due to necessary super-user privs etc.), it's simpler to do minimal changes to LCE to create/encapsulate into cgroups. Thoughts?
        Hide
        Alejandro Abdelnur added a comment -

        I like to introduce the ResourceEnforcer interface for the following reasons:

        • It provides clean lifecycle hooks for initializing/configuring/cleanup cgroups, leaving to the LCE just the the actual binding.
        • It will work with multiple container executors as oposed to LCE only.
        • Makes the changes in the LCE minimal (IMO, the less logic with put in native code the better).
        • taskset could easily be implemented as a ResourceEnforcer.
        • If we eventually want to control other resources via cgroups (such as memory/disk/network), only the ResourceEnforcer would require changes.

        Fair enough?

        Show
        Alejandro Abdelnur added a comment - I like to introduce the ResourceEnforcer interface for the following reasons: It provides clean lifecycle hooks for initializing/configuring/cleanup cgroups, leaving to the LCE just the the actual binding. It will work with multiple container executors as oposed to LCE only. Makes the changes in the LCE minimal (IMO, the less logic with put in native code the better). taskset could easily be implemented as a ResourceEnforcer. If we eventually want to control other resources via cgroups (such as memory/disk/network), only the ResourceEnforcer would require changes. Fair enough?
        Hide
        Arun C Murthy added a comment -

        Thanks tucu, this is getting close.

        Please help me understand if the following (simpler) proposal will work:

        1. NM calls LCE.launchContainer with the cpu-set.
        2. LCE will create the necessary cgroup if necessary
        3. LCE will launch the process within the cgroup

        Pros: This way, we avoid new interfaces such as ResourceEnforcer and we can also use taskset if necessary. Taskset should also work for DefaultContianerExecutor.

        Thoughts?

        Show
        Arun C Murthy added a comment - Thanks tucu, this is getting close. Please help me understand if the following (simpler) proposal will work: NM calls LCE.launchContainer with the cpu-set. LCE will create the necessary cgroup if necessary LCE will launch the process within the cgroup Pros: This way, we avoid new interfaces such as ResourceEnforcer and we can also use taskset if necessary. Taskset should also work for DefaultContianerExecutor. Thoughts?
        Hide
        Alejandro Abdelnur added a comment -

        I was chatting offline with Arun about this JIRA. His key concern is that it should be possible to use cgroups without requiring the installation of additional packages and extra OS configuration. As the LinuxContainerExecutor already runs as root, we can leverage that to create the cgroup mounts. This means that the LinuxContainerExecutor is required to use cgroups with zero configuration. While typically the LinuxContainerExecutor is used in secure clusters, still it can be used in non-secure cluster always running as the mapred user (which would be the equivalent of the DefaultContainerExecutor).

        Given this how about the following proposal?

        This approach will not depend on cgexec binary being installed.

        • The LinuxContainerExecutor would have 2 new options.
          • --cgroupsinit <PARAM..>: This option will be used for initialization. When invoked with this option, the LCE will create the cgroup mount point would and give owmership of it to the yarn user. Then it will complete its execution.
          • --cgroup <PARAM>: This option will be used for launching containers. When invoked with this option, the LCE will add the process to specified cgroup paramerer.
        • The ResourceEnforcer will have the following methods (exactly as in the latest patch):
          • init(): called when the RM is initialized.
          • preExecute(containerId, Resource): called before launching the container.
          • wrapCommand(containerId, command): augments the execution command line before launching.
          • postExecute(containerId): called after launching the container.
        • A default implementation of the ResourceEnforcer will do NOPs.
        • The CgroupsResourceEnforcer implementation will do the following:
          • init(): call LCE --cgroupsinit
          • preExecute(containerId, Resource): configure the cgroup with the assigned cpu resources.
          • wrapCommand(containerId, command): augments regular LCE invocation with the -cgroup option.
          • postExecute(containerId): any necessary cgroup clean up.
        Show
        Alejandro Abdelnur added a comment - I was chatting offline with Arun about this JIRA. His key concern is that it should be possible to use cgroups without requiring the installation of additional packages and extra OS configuration. As the LinuxContainerExecutor already runs as root, we can leverage that to create the cgroup mounts. This means that the LinuxContainerExecutor is required to use cgroups with zero configuration. While typically the LinuxContainerExecutor is used in secure clusters, still it can be used in non-secure cluster always running as the mapred user (which would be the equivalent of the DefaultContainerExecutor). Given this how about the following proposal? This approach will not depend on cgexec binary being installed. The LinuxContainerExecutor would have 2 new options. --cgroupsinit <PARAM..>: This option will be used for initialization. When invoked with this option, the LCE will create the cgroup mount point would and give owmership of it to the yarn user. Then it will complete its execution. --cgroup <PARAM>: This option will be used for launching containers. When invoked with this option, the LCE will add the process to specified cgroup paramerer. The ResourceEnforcer will have the following methods (exactly as in the latest patch): init(): called when the RM is initialized. preExecute(containerId, Resource): called before launching the container. wrapCommand(containerId, command): augments the execution command line before launching. postExecute(containerId): called after launching the container. A default implementation of the ResourceEnforcer will do NOPs. The CgroupsResourceEnforcer implementation will do the following: init(): call LCE --cgroupsinit preExecute(containerId, Resource): configure the cgroup with the assigned cpu resources. wrapCommand(containerId, command): augments regular LCE invocation with the -cgroup option. postExecute(containerId): any necessary cgroup clean up.
        Hide
        Karthik Kambatla added a comment -

        +1 on design - 2(b), and the patch looks good.

        Show
        Karthik Kambatla added a comment - +1 on design - 2(b), and the patch looks good.
        Hide
        Alejandro Abdelnur added a comment -

        I like the current patch, it does not add complexity and it will be trivial to wire it with MAPREDUCE-4327 once CPU units are part of resources.

        Show
        Alejandro Abdelnur added a comment - I like the current patch, it does not add complexity and it will be trivial to wire it with MAPREDUCE-4327 once CPU units are part of resources.
        Andrew Ferguson made changes -
        Attachment MAPREDUCE-4334-executor-v4.patch [ 12538437 ]
        Hide
        Andrew Ferguson added a comment -

        Updated version of executor-v3 which moves the actual wrapping of the launched command inside the "wrapCommand" method (which previously returned a value to prefix onto the launched command.

        Show
        Andrew Ferguson added a comment - Updated version of executor-v3 which moves the actual wrapping of the launched command inside the "wrapCommand" method (which previously returned a value to prefix onto the launched command.
        Andrew Ferguson made changes -
        Attachment mapreduce-4334-design-doc-v2.txt [ 12538426 ]
        Hide
        Andrew Ferguson added a comment -

        just a quick update to the design doc. at two points I wrote "create cgroups" when I meant "mount cgroups"; also fixes a typo. sorry for the spam!

        thanks,
        Andrew

        Show
        Andrew Ferguson added a comment - just a quick update to the design doc. at two points I wrote "create cgroups" when I meant "mount cgroups"; also fixes a typo. sorry for the spam! thanks, Andrew
        Andrew Ferguson made changes -
        Attachment MAPREDUCE-4334-executor-v3.patch [ 12538420 ]
        Hide
        Andrew Ferguson added a comment -

        Updated version of "executor-v2" patch, which uses cgexec and hooks into the ContainersLauncher. See previously attached design doc for further details.

        thanks!
        Andrew

        Show
        Andrew Ferguson added a comment - Updated version of "executor-v2" patch, which uses cgexec and hooks into the ContainersLauncher. See previously attached design doc for further details. thanks! Andrew
        Andrew Ferguson made changes -
        Attachment mapreduce-4334-design-doc.txt [ 12538417 ]
        Hide
        Andrew Ferguson added a comment -

        Design document outlining the two primary designs proposed here, as well as an alternate version of the second. Summarizes pros/cons discussed earlier in the JIRA.

        More data, including screenshots from a live demo available here: http://www.cs.brown.edu/~adf/files/CgroupsPresentation.pptx

        Show
        Andrew Ferguson added a comment - Design document outlining the two primary designs proposed here, as well as an alternate version of the second. Summarizes pros/cons discussed earlier in the JIRA. More data, including screenshots from a live demo available here: http://www.cs.brown.edu/~adf/files/CgroupsPresentation.pptx
        Jeff Hammerbacher made changes -
        Link This issue is related to MAPREDUCE-4351 [ MAPREDUCE-4351 ]
        Jeff Hammerbacher made changes -
        Link This issue is related to MAPREDUCE-4327 [ MAPREDUCE-4327 ]
        Hide
        Arun C Murthy added a comment -

        Andrew - I'll ask again.

        Can you please provide a simple writeup? I'm confused seeing new interfaces pop-up in every new patch. Thanks.

        Show
        Arun C Murthy added a comment - Andrew - I'll ask again. Can you please provide a simple writeup? I'm confused seeing new interfaces pop-up in every new patch. Thanks.
        Hide
        Andrew Ferguson added a comment -

        Hi,

        Why the ResourceEnforcer is bubble up all the way to the NodeManager instead just being instantiated & configured in the ContainerLauncher where it seems the use of before() & after() and then passed to the ContainerExecutor as a parameter in the launchContainer() method?

        the reason is because I was trying to pattern-match how the ContainerExecutor works, and the ContainerExecutor is instantiated by the NodeManager. If you think it makes more sense to break with the pattern and keep the ResourceEnforcer localized to the ContainersLauncher, then I certainly do that.

        thanks! I will incorporate your other comments into the patch.

        Andrew

        Show
        Andrew Ferguson added a comment - Hi, Why the ResourceEnforcer is bubble up all the way to the NodeManager instead just being instantiated & configured in the ContainerLauncher where it seems the use of before() & after() and then passed to the ContainerExecutor as a parameter in the launchContainer() method? the reason is because I was trying to pattern-match how the ContainerExecutor works, and the ContainerExecutor is instantiated by the NodeManager. If you think it makes more sense to break with the pattern and keep the ResourceEnforcer localized to the ContainersLauncher, then I certainly do that. thanks! I will incorporate your other comments into the patch. Andrew
        Hide
        Alejandro Abdelnur added a comment -

        I like the approach much better than the previous patch.

        • Why the ResourceEnforcer is bubble up all the way to the NodeManager instead just being instantiated & configured in the ContainerLauncher where it seems the use of before() & after() and then passed to the ContainerExecutor as a parameter in the launchContainer() method?
        • The method names in the ResourceEnforcer seem a bit off. How about the following alternative names: before() -> preLaunch(), after() -> postLaunch() & commandPrefix -> wrapLauncherCommand()
        • Instead having an init(Configuration conf) method in the ResourceEnforcer why not make it implement Configurable and have an init() method. Then the configuration is set at instantiation by the ReflectionUtils.newInstance() ?
        Show
        Alejandro Abdelnur added a comment - I like the approach much better than the previous patch. Why the ResourceEnforcer is bubble up all the way to the NodeManager instead just being instantiated & configured in the ContainerLauncher where it seems the use of before() & after() and then passed to the ContainerExecutor as a parameter in the launchContainer() method? The method names in the ResourceEnforcer seem a bit off. How about the following alternative names: before() -> preLaunch(), after() -> postLaunch() & commandPrefix -> wrapLauncherCommand() Instead having an init(Configuration conf) method in the ResourceEnforcer why not make it implement Configurable and have an init() method. Then the configuration is set at instantiation by the ReflectionUtils.newInstance() ?
        Hide
        Andrew Ferguson added a comment -

        Forgot to mention that this version requires the "cgexec" binary, which, while not required for cgroups, is commonly available. If we choose not to introduce a dependency on cgexec, then we can return to modifying the C code in LinuxContainerExecutor, as the previous version of this patch did.

        thanks,
        Andrew

        Show
        Andrew Ferguson added a comment - Forgot to mention that this version requires the "cgexec" binary, which, while not required for cgroups, is commonly available. If we choose not to introduce a dependency on cgexec, then we can return to modifying the C code in LinuxContainerExecutor, as the previous version of this patch did. thanks, Andrew
        Andrew Ferguson made changes -
        Attachment MAPREDUCE-4334-executor-v2.patch [ 12537924 ]
        Hide
        Andrew Ferguson added a comment -

        New version of patch which hooks into the container executors to start cgroups.

        This creates a pluggable "ResourceEnforcer" which is called before & after creating a container. The ResourceEnforcer can also prefix the command to launch the container – this prevents us from having to modify the C code in LinuxContainerExecutor.

        This patch comes with two ResourceEnforcers: a default one, which does nothing, and a CgroupsResourceEnforcer.

        Show
        Andrew Ferguson added a comment - New version of patch which hooks into the container executors to start cgroups. This creates a pluggable "ResourceEnforcer" which is called before & after creating a container. The ResourceEnforcer can also prefix the command to launch the container – this prevents us from having to modify the C code in LinuxContainerExecutor. This patch comes with two ResourceEnforcers: a default one, which does nothing, and a CgroupsResourceEnforcer.
        Hide
        Andrew Ferguson added a comment -

        Hi Alejandro,

        thanks very much for looking at the patch & for the feedback. indeed, the patch should come with a no-op version which is enabled by default. (the current patch simply fails to find any cgroups if they are not configured, and then skips trying to use them.)

        I will update the patch tomorrow so it continues to have a lower impact on the codebase.

        thanks,
        Andrew

        Show
        Andrew Ferguson added a comment - Hi Alejandro, thanks very much for looking at the patch & for the feedback. indeed, the patch should come with a no-op version which is enabled by default. (the current patch simply fails to find any cgroups if they are not configured, and then skips trying to use them.) I will update the patch tomorrow so it continues to have a lower impact on the codebase. thanks, Andrew
        Hide
        Alejandro Abdelnur added a comment -

        Patch has TAB characters, it should not. Indentation should be 2 spaces.

        • ContainerExecutor.java

        Instead having 2 different ConcurrentMaps, why not having one holding a data structure for pidFiles and cgroupFiles?

        Why do we need read/write locsk when accessing a ConcurrentMap?

        • DefaultContainerExecutor.java

        The for loop adding the process ID to the cgroup should be within { }, even if it is a single line.

        • CgroupsCreator.java

        Shouldn't, at initialization, enabled/disable itself based on a config property that indicates if Cgroups are enabled or not? And if disabled all methods would be NOP?

        Show
        Alejandro Abdelnur added a comment - Patch has TAB characters, it should not. Indentation should be 2 spaces. ContainerExecutor.java Instead having 2 different ConcurrentMaps, why not having one holding a data structure for pidFiles and cgroupFiles? Why do we need read/write locsk when accessing a ConcurrentMap? DefaultContainerExecutor.java The for loop adding the process ID to the cgroup should be within { }, even if it is a single line. CgroupsCreator.java Shouldn't, at initialization, enabled/disable itself based on a config property that indicates if Cgroups are enabled or not? And if disabled all methods would be NOP?
        Andrew Ferguson made changes -
        Attachment MAPREDUCE-4334-executor-v1.patch [ 12537634 ]
        Hide
        Andrew Ferguson added a comment -

        hello everyone,

        in this new version of the patch, the cgroups are established before the task is launched. then, updated versions of the LinuxContainerExecutor and DefaultContainerExector move the task into the cgroup.

        in this patch, the only cgroup which is used is one for CPU sharing.

        this patch does not have any other dependencies.

        thank you,
        Andrew

        Show
        Andrew Ferguson added a comment - hello everyone, in this new version of the patch, the cgroups are established before the task is launched. then, updated versions of the LinuxContainerExecutor and DefaultContainerExector move the task into the cgroup. in this patch, the only cgroup which is used is one for CPU sharing. this patch does not have any other dependencies. thank you, Andrew
        Hide
        Andrew Ferguson added a comment -

        Hi Hari,

        In my experiments, there are usually 200-400ms between starting to create the cgroups and having the process completely inside them. This number is likely an upper-bound, as the experiments are in pseudo-distributed mode on a VM.

        Note that in the design represented by this patch, I move the process into the cgroup asynchronously, so the latency is not incurred while starting the process. However, in my reading of Arun's comments, he would prefer that the cgroups be created synchronously while starting the job. I am currently in the progress of making this change. While I suspect the cost may not be as high as 200-400ms, it will of course be non-zero.

        cheers,
        Andrew

        Show
        Andrew Ferguson added a comment - Hi Hari, In my experiments, there are usually 200-400ms between starting to create the cgroups and having the process completely inside them. This number is likely an upper-bound, as the experiments are in pseudo-distributed mode on a VM. Note that in the design represented by this patch, I move the process into the cgroup asynchronously, so the latency is not incurred while starting the process. However, in my reading of Arun's comments, he would prefer that the cgroups be created synchronously while starting the job. I am currently in the progress of making this change. While I suspect the cost may not be as high as 200-400ms, it will of course be non-zero. cheers, Andrew
        Hide
        Hari Mankude added a comment -

        Relevant information would be the performance impact of running maps and reduces in cgroups in terms of latency.

        Overall, this would be a very useful feature since it is possible to add fencing around cpu/io resources in addition to memory usage for MR tasks.

        Show
        Hari Mankude added a comment - Relevant information would be the performance impact of running maps and reduces in cgroups in terms of latency. Overall, this would be a very useful feature since it is possible to add fencing around cpu/io resources in addition to memory usage for MR tasks.
        Hide
        Bikas Saha added a comment -

        Aside from a design proposal I would be really interested in seeing how exactly cgroups work in the context of our typical workload. Say, take a bunch of typical mappers and reducers and run them in isolation. Then run them in isolation within cgroups. Is there a difference? Now run them concurrently with and without cgroups. What are the observations? These experiments may lead to expected or unexpected results and would be a great addition to the design pros and cons. Perhaps you have already run those experiments. If yes, care sharing the results.

        Show
        Bikas Saha added a comment - Aside from a design proposal I would be really interested in seeing how exactly cgroups work in the context of our typical workload. Say, take a bunch of typical mappers and reducers and run them in isolation. Then run them in isolation within cgroups. Is there a difference? Now run them concurrently with and without cgroups. What are the observations? These experiments may lead to expected or unexpected results and would be a great addition to the design pros and cons. Perhaps you have already run those experiments. If yes, care sharing the results.
        Hide
        Andrew Ferguson added a comment -

        Hi Arun,

        I feel like we've been discussing pros & cons for the length of this JIRA. I think, perhaps, I proposed too large of a change across this issue and MAPREDUCE-4351: cgroups for cpu, cgroups for memory, a code refactoring, etc.

        Instead, I would like to make a smaller change, with just cgroups for CPUs and place them in each launcher's code, as you requested above. Perhaps a better re-factoring than I suggested with the ContainersMonitor will become clear afterwards.

        How does this sound to you? I was planning to finish it up on Monday.

        best,
        Andrew

        Show
        Andrew Ferguson added a comment - Hi Arun, I feel like we've been discussing pros & cons for the length of this JIRA. I think, perhaps, I proposed too large of a change across this issue and MAPREDUCE-4351 : cgroups for cpu, cgroups for memory, a code refactoring, etc. Instead, I would like to make a smaller change, with just cgroups for CPUs and place them in each launcher's code, as you requested above. Perhaps a better re-factoring than I suggested with the ContainersMonitor will become clear afterwards. How does this sound to you? I was planning to finish it up on Monday. best, Andrew
        Hide
        Arun C Murthy added a comment -

        I disagree.

        Andrew, it seems we are stuck in the weeds debating minutia of the code.

        Let's take a step back.

        Can you please start by providing a writeup about your approach(es) and pros/cons? Thanks.

        Show
        Arun C Murthy added a comment - I disagree. Andrew, it seems we are stuck in the weeds debating minutia of the code. Let's take a step back. Can you please start by providing a writeup about your approach(es) and pros/cons? Thanks.
        Andrew Ferguson made changes -
        Attachment MAPREDUCE-4334-v2.patch [ 12536900 ]
        Hide
        Andrew Ferguson added a comment -

        Updated patch which moves a whole process tree into a cgroup.

        Also sets the memory.move_charge_at_immigrate flag on the memory cgroup.

        Something else we could consider setting: memory.soft_limit_in_bytes – it provides soft limits for memory usage, although they are only best effort.

        Show
        Andrew Ferguson added a comment - Updated patch which moves a whole process tree into a cgroup. Also sets the memory.move_charge_at_immigrate flag on the memory cgroup. Something else we could consider setting: memory.soft_limit_in_bytes – it provides soft limits for memory usage, although they are only best effort.
        Andrew Ferguson made changes -
        Attachment MAPREDUCE-4334-v1.patch [ 12536743 ]
        Hide
        Andrew Ferguson added a comment -

        Following a suggestion from Todd Lipcon, this version uses Cgroups to simply balance the CPU resources consumed by containers (they are still used for memory limits). This allows us to fairly share CPU resources today, while MAPREDUCE-4327 is still being worked on.

        Show
        Andrew Ferguson added a comment - Following a suggestion from Todd Lipcon, this version uses Cgroups to simply balance the CPU resources consumed by containers (they are still used for memory limits). This allows us to fairly share CPU resources today, while MAPREDUCE-4327 is still being worked on.
        Hide
        Andrew Ferguson added a comment -

        Now, it seems like we should enhance the container-launch via LCE to just set the requisite cgroups or sched_affinity prior-to or right-after the container launch, rather than make them apis. That would be the safest, no?

        I disagree. That was the first approach I took for implementing this, but found it to be unsatisfactory for several reasons. See: https://issues.apache.org/jira/browse/MAPREDUCE-4334?focusedCommentId=13413913&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13413913 starting at "My first design for this..."

        Show
        Andrew Ferguson added a comment - Now, it seems like we should enhance the container-launch via LCE to just set the requisite cgroups or sched_affinity prior-to or right-after the container launch, rather than make them apis. That would be the safest, no? I disagree. That was the first approach I took for implementing this, but found it to be unsatisfactory for several reasons. See: https://issues.apache.org/jira/browse/MAPREDUCE-4334?focusedCommentId=13413913&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13413913 starting at "My first design for this..."
        Hide
        Robert Joseph Evans added a comment -

        I agree with Bikas and Arun to a point. I can see some situations, like running a multi-tenent Hadoop cloud where you do want strict isolation. So that the people who are paying a premium to get consistent results from their part of the cluster never have to worry about someone else doing something really bad on another part of the cluster. Is this enough of a concern to make it the default, I would say no. Is it enough of a concern to make it an option that comes with and is maintained by Hadoop, that is TBD, I don't plan on running my clusters that way, but I am not the only Hadoop customer. Arun, didn't you mention something at Hadoop Summit about some discussions you had with people who want full VMs to run their containers in specifically for isolation purposes?

        As for memory spikes, at least on Linux I thought you could configure swap on Linux containers so that if a container goes over its budget, i.e. spikes, then it swaps to disk instead of launching the OOM killer. I could be wrong, I have not dug into it very much.

        Show
        Robert Joseph Evans added a comment - I agree with Bikas and Arun to a point. I can see some situations, like running a multi-tenent Hadoop cloud where you do want strict isolation. So that the people who are paying a premium to get consistent results from their part of the cluster never have to worry about someone else doing something really bad on another part of the cluster. Is this enough of a concern to make it the default, I would say no. Is it enough of a concern to make it an option that comes with and is maintained by Hadoop, that is TBD, I don't plan on running my clusters that way, but I am not the only Hadoop customer. Arun, didn't you mention something at Hadoop Summit about some discussions you had with people who want full VMs to run their containers in specifically for isolation purposes? As for memory spikes, at least on Linux I thought you could configure swap on Linux containers so that if a container goes over its budget, i.e. spikes, then it swaps to disk instead of launching the OOM killer. I could be wrong, I have not dug into it very much.
        Hide
        Arun C Murthy added a comment -

        So, concretely, this is my proposal:
        recognize the LCE binary as the "hadoop root tool"
        the LCE will have two new functionalities: 1) sched_setaffinity and 2) creating cgroups
        in addition to the patch above, I will create 1) another pluggable ContainersMonitor which can use these new functions (sched_setaffinity) and 2) adapt the one above to optionally use the (creating cgroups) functionality of the "hadoop root tool"

        Thanks, looks like we finally are on the same page - it's what I've been proposing for a while now.

        Now, it seems like we should enhance the container-launch via LCE to just set the requisite cgroups or sched_affinity prior-to or right-after the container launch, rather than make them apis. That would be the safest, no?

        Show
        Arun C Murthy added a comment - So, concretely, this is my proposal: recognize the LCE binary as the "hadoop root tool" the LCE will have two new functionalities: 1) sched_setaffinity and 2) creating cgroups in addition to the patch above, I will create 1) another pluggable ContainersMonitor which can use these new functions (sched_setaffinity) and 2) adapt the one above to optionally use the (creating cgroups) functionality of the "hadoop root tool" Thanks, looks like we finally are on the same page - it's what I've been proposing for a while now. Now, it seems like we should enhance the container-launch via LCE to just set the requisite cgroups or sched_affinity prior-to or right-after the container launch, rather than make them apis. That would be the safest, no?
        Hide
        Arun C Murthy added a comment -

        Good points Bikas, I tend to agree with them.

        In the past we used OS limits (via ulimit) and had several issues with temporary spikes (particularly with Java processes forking) and hence we moved away from OS limits to custom built one which ignores spikes etc.

        Show
        Arun C Murthy added a comment - Good points Bikas, I tend to agree with them. In the past we used OS limits (via ulimit) and had several issues with temporary spikes (particularly with Java processes forking) and hence we moved away from OS limits to custom built one which ignores spikes etc.
        Hide
        Andrew Ferguson added a comment -

        Hi Bikas, thanks for thinking about this! Comments inline:

        Somewhere in this thread it was mentioned controlling memory via OS. In my experience this is not an optimal choice because

        1) makes it hard to debug task failures due to memory issues. Abrupt OS termination or denial or more memory resulting in NPE/bad pointers etc. Its better to just monitor the memory and then enforce limits with clear error message saying - task was terminated because it used more memory than alloted.

        On Linux, enforcing memory limits via Cgroups feels a bit like simply running a process on a machine with less memory installed. When the memory allocation is pushing the threshold, the Linux OOM killer destroys the task. The patch above detects that the process has been killed and logs a error message indicating that the task was killed for consuming too many reousrces.

        2) due to different scenarios, tasks may have memory spikes or temporary increases. The OS will enforce tight limits but NodeManager monitoring can be more flexible and not terminate a task because it shot to 2.1GB instead of staying under 2.

        I would argue that the strict enforcement of Cgroups is exactly the behavior we want because it provides isolation. If two containers are running on a node with 4 GB of RAM, and each are using 2 GB, and one happens to spike to 3 GB momentarily, the spiking container should suffer – if we continue monitoring the memory as done today, then the well-behaved container might suffer by being swapped-out to make room for the spiking container.

        I believe the spiking concern is mitigated by the fact that Cgroups allows you to set both a physical memory limit, and a virtual memory limit (which my patch above makes use of). For example, I set the physical memory limit to say, 1 GB of RAM, and the virtual memory limit to 2.1 GB. When a process momentarily spikes above it's 1 GB of RAM, it will be allocated memory from swap without a problem. This is configurable by the already extant "yarn.nodemanager.vmem-pmem-ratio" setting.

        Disk scheduling and monitoring would be a hard to achieve goal with multiple writers to disk spinning things their own way and expecting something that will likely not happen.

        Sure, it is tricky, and the feasibility depends on the semantics YARN promises applications. However, the Linux Completely Fair Queuing I/O scheduler has semantics which are quite similar to the semantics I'm proposing we promise for CPUs (proportional weights). The blkio Cgroup subsystem already today provides both proportional sharing and throttling: http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/ch-Subsystems_and_Tunable_Parameters.html#sec-blkio

        Network scheduling and monitoring shares choke points at multiple levels beyond the machines and trying to optimally and proportionally use the network tends to be a problem thats better served globally.

        YARN is a global scheduler. Linux traffic controls [1], in combination with the network controller for Cgroups, can be used to implement the results of Seawall [2], FairCloud [3], and similar projects. There are many datacenter designs these days; some will be a perfect match for end-host-only bandwidth control, and others an imperfect match. While end-host-only bandwidth control is not a magic bullet, I strongly believe that it is both useful enough, and easy enough to implement, to warrant pursuit.

        My 2 cents would be to limit this to just CPU for now.

        It is. However, I believe the patch above is easily extensible to other resources (you can see for yourself that there is a small difference between the memory-only patch, and the memory+cpu patch).

        Based on the comments above, I would agree that we need to make sure platform specific stuff should not leak into the code so that other platforms (imminently Windows) can support this stuff.

        Totally agree. That's why I proposed making it pluggable with MAPREDUCE-4351.

        An alternative to pluggable ContainersMonitor would be to make CPU management a pluggable component of ContainersManager. My POV is that ContainersManager manages the resources of containers and has logic that will be common across platforms. The tools it uses will change. Eg. ProcfsBaseProcessTree is the tool used to monitor and manage memory. I can see that being changed to a MemoryMonitor interface with platform specific implementations. Thats whats happening on the Windows port in branch 1. I can see a CPUMonitor interface for CPU. Or maybe a ResourceMonitor that has methods for both memory and CPU.

        I'm afraid I'm a bit confused by your suggestion here – ContainersMonitor is already a part of the ContainersManager. Are you proposing that we create a pluggable interface for each type of resource independently? Perhaps you can point me to the code & branch which has the suggestion you are describing? There are two pieces to resource management: monitoring & enforcement, and both are platform-specific. Because multiple Linux enforcement solutions (the current Java-native, the above Cgroups, and the planned taskset) can all use the same Linux-specific monitoring code, it seems reasonable to keep the two features separate. The monitoring code is already pluggable (ResourceCalculatorPlugin).

        Thanks!
        Andrew

        [1] http://lartc.org/howto/ and 'man tc'
        [2] http://research.microsoft.com/en-us/UM/people/srikanth/data/nsdi11_seawall.pdf
        [3] http://www.hpl.hp.com/people/lucian_popa/faircloud_hotnets.pdf

        Show
        Andrew Ferguson added a comment - Hi Bikas, thanks for thinking about this! Comments inline: Somewhere in this thread it was mentioned controlling memory via OS. In my experience this is not an optimal choice because 1) makes it hard to debug task failures due to memory issues. Abrupt OS termination or denial or more memory resulting in NPE/bad pointers etc. Its better to just monitor the memory and then enforce limits with clear error message saying - task was terminated because it used more memory than alloted. On Linux, enforcing memory limits via Cgroups feels a bit like simply running a process on a machine with less memory installed. When the memory allocation is pushing the threshold, the Linux OOM killer destroys the task. The patch above detects that the process has been killed and logs a error message indicating that the task was killed for consuming too many reousrces. 2) due to different scenarios, tasks may have memory spikes or temporary increases. The OS will enforce tight limits but NodeManager monitoring can be more flexible and not terminate a task because it shot to 2.1GB instead of staying under 2. I would argue that the strict enforcement of Cgroups is exactly the behavior we want because it provides isolation. If two containers are running on a node with 4 GB of RAM, and each are using 2 GB, and one happens to spike to 3 GB momentarily, the spiking container should suffer – if we continue monitoring the memory as done today, then the well-behaved container might suffer by being swapped-out to make room for the spiking container. I believe the spiking concern is mitigated by the fact that Cgroups allows you to set both a physical memory limit, and a virtual memory limit (which my patch above makes use of). For example, I set the physical memory limit to say, 1 GB of RAM, and the virtual memory limit to 2.1 GB. When a process momentarily spikes above it's 1 GB of RAM, it will be allocated memory from swap without a problem. This is configurable by the already extant "yarn.nodemanager.vmem-pmem-ratio" setting. Disk scheduling and monitoring would be a hard to achieve goal with multiple writers to disk spinning things their own way and expecting something that will likely not happen. Sure, it is tricky, and the feasibility depends on the semantics YARN promises applications. However, the Linux Completely Fair Queuing I/O scheduler has semantics which are quite similar to the semantics I'm proposing we promise for CPUs (proportional weights). The blkio Cgroup subsystem already today provides both proportional sharing and throttling: http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/ch-Subsystems_and_Tunable_Parameters.html#sec-blkio Network scheduling and monitoring shares choke points at multiple levels beyond the machines and trying to optimally and proportionally use the network tends to be a problem thats better served globally. YARN is a global scheduler. Linux traffic controls [1] , in combination with the network controller for Cgroups, can be used to implement the results of Seawall [2] , FairCloud [3] , and similar projects. There are many datacenter designs these days; some will be a perfect match for end-host-only bandwidth control, and others an imperfect match. While end-host-only bandwidth control is not a magic bullet, I strongly believe that it is both useful enough, and easy enough to implement, to warrant pursuit. My 2 cents would be to limit this to just CPU for now. It is. However, I believe the patch above is easily extensible to other resources (you can see for yourself that there is a small difference between the memory-only patch, and the memory+cpu patch). Based on the comments above, I would agree that we need to make sure platform specific stuff should not leak into the code so that other platforms (imminently Windows) can support this stuff. Totally agree. That's why I proposed making it pluggable with MAPREDUCE-4351 . An alternative to pluggable ContainersMonitor would be to make CPU management a pluggable component of ContainersManager. My POV is that ContainersManager manages the resources of containers and has logic that will be common across platforms. The tools it uses will change. Eg. ProcfsBaseProcessTree is the tool used to monitor and manage memory. I can see that being changed to a MemoryMonitor interface with platform specific implementations. Thats whats happening on the Windows port in branch 1. I can see a CPUMonitor interface for CPU. Or maybe a ResourceMonitor that has methods for both memory and CPU. I'm afraid I'm a bit confused by your suggestion here – ContainersMonitor is already a part of the ContainersManager. Are you proposing that we create a pluggable interface for each type of resource independently? Perhaps you can point me to the code & branch which has the suggestion you are describing? There are two pieces to resource management: monitoring & enforcement, and both are platform-specific. Because multiple Linux enforcement solutions (the current Java-native, the above Cgroups, and the planned taskset) can all use the same Linux-specific monitoring code, it seems reasonable to keep the two features separate. The monitoring code is already pluggable (ResourceCalculatorPlugin). Thanks! Andrew [1] http://lartc.org/howto/ and 'man tc' [2] http://research.microsoft.com/en-us/UM/people/srikanth/data/nsdi11_seawall.pdf [3] http://www.hpl.hp.com/people/lucian_popa/faircloud_hotnets.pdf
        Hide
        Bikas Saha added a comment -

        Somewhere in this thread it was mentioned controlling memory via OS. In my experience this is not an optimal choice because
        1) makes it hard to debug task failures due to memory issues. Abrupt OS termination or denial or more memory resulting in NPE/bad pointers etc. Its better to just monitor the memory and then enforce limits with clear error message saying - task was terminated because it used more memory than alloted.
        2) due to different scenarios, tasks may have memory spikes or temporary increases. The OS will enforce tight limits but NodeManager monitoring can be more flexible and not terminate a task because it shot to 2.1GB instead of staying under 2.

        Disk scheduling and monitoring would be a hard to achieve goal with multiple writers to disk spinning things their own way and expecting something that will likely not happen. Network scheduling and monitoring shares choke points at multiple levels beyond the machines and trying to optimally and proportionally use the network tends to be a problem thats better served globally.

        My 2 cents would be to limit this to just CPU for now. Based on the comments above, I would agree that we need to make sure platform specific stuff should not leak into the code so that other platforms (imminently Windows) can support this stuff.
        An alternative to pluggable ContainersMonitor would be to make CPU management a pluggable component of ContainersManager. My POV is that ContainersManager manages the resources of containers and has logic that will be common across platforms. The tools it uses will change. Eg. ProcfsBaseProcessTree is the tool used to monitor and manage memory. I can see that being changed to a MemoryMonitor interface with platform specific implementations. Thats whats happening on the Windows port in branch 1. I can see a CPUMonitor interface for CPU. Or maybe a ResourceMonitor that has methods for both memory and CPU.

        Show
        Bikas Saha added a comment - Somewhere in this thread it was mentioned controlling memory via OS. In my experience this is not an optimal choice because 1) makes it hard to debug task failures due to memory issues. Abrupt OS termination or denial or more memory resulting in NPE/bad pointers etc. Its better to just monitor the memory and then enforce limits with clear error message saying - task was terminated because it used more memory than alloted. 2) due to different scenarios, tasks may have memory spikes or temporary increases. The OS will enforce tight limits but NodeManager monitoring can be more flexible and not terminate a task because it shot to 2.1GB instead of staying under 2. Disk scheduling and monitoring would be a hard to achieve goal with multiple writers to disk spinning things their own way and expecting something that will likely not happen. Network scheduling and monitoring shares choke points at multiple levels beyond the machines and trying to optimally and proportionally use the network tends to be a problem thats better served globally. My 2 cents would be to limit this to just CPU for now. Based on the comments above, I would agree that we need to make sure platform specific stuff should not leak into the code so that other platforms (imminently Windows) can support this stuff. An alternative to pluggable ContainersMonitor would be to make CPU management a pluggable component of ContainersManager. My POV is that ContainersManager manages the resources of containers and has logic that will be common across platforms. The tools it uses will change. Eg. ProcfsBaseProcessTree is the tool used to monitor and manage memory. I can see that being changed to a MemoryMonitor interface with platform specific implementations. Thats whats happening on the Windows port in branch 1. I can see a CPUMonitor interface for CPU. Or maybe a ResourceMonitor that has methods for both memory and CPU.
        Hide
        Andrew Ferguson added a comment -

        Arun – I think we might be talking past each other, as we agree that both cgroups and taskset should be available.

        BTW, it turns out the sched_setaffinity() syscall does not require root if it is applied to a process you own. Therefore, if you are running with the DefaultContainerExecutor, you can still use sched_setaffinity, which is excellent.

        I think this is the matrix of possible use cases:
        1) launch container as user & use sched_setaffinity / taskset / CPU pinning
        2) launch container as user & use cgroups completely managed by Hadoop
        3) launch container as user & use cgroups managed by the cluster operator
        4) launch container as Hadoop & use sched_setaffinity / taskset / CPU pinning
        5) launch container as Hadoop & use cgroups completely managed by Hadoop
        6) launch container as Hadoop & use cgroups managed by the cluster operator

        Cases 1, 2, 3 and 5 require root privs.

        Cases 3 and 6 are covered by the patch above.

        I'm happy to expand the LCE into a "hadoop root tool" which can be used in cases 1, 2, 3, and 5.

        In my mind, the design question is how to cover all six cases with the most amount of code re-use.

        Today, we have two important ContainerManager subsystems: the Launcher and the Monitor. Today, reforce enforcement is entirely done within the Monitor. The question is, where should new resource enforcement be done? I think the answer is still "in the Monitor" even though, in some use cases, it needs access to root privs. To get access to those privs, it can call the LCE binary (aka the "hadoop root tool"), just as the java-side of the LCE does today.

        So, concretely, this is my proposal:

        • recognize the LCE binary as the "hadoop root tool"
        • the LCE will have two new functionalities: 1) sched_setaffinity and 2) creating cgroups
        • in addition to the patch above, I will create 1) another pluggable ContainersMonitor which can use these new functions (sched_setaffinity) and 2) adapt the one above to optionally use the (creating cgroups) functionality of the "hadoop root tool"

        how does that sound?

        Show
        Andrew Ferguson added a comment - Arun – I think we might be talking past each other, as we agree that both cgroups and taskset should be available. BTW, it turns out the sched_setaffinity() syscall does not require root if it is applied to a process you own. Therefore, if you are running with the DefaultContainerExecutor, you can still use sched_setaffinity, which is excellent. I think this is the matrix of possible use cases: 1) launch container as user & use sched_setaffinity / taskset / CPU pinning 2) launch container as user & use cgroups completely managed by Hadoop 3) launch container as user & use cgroups managed by the cluster operator 4) launch container as Hadoop & use sched_setaffinity / taskset / CPU pinning 5) launch container as Hadoop & use cgroups completely managed by Hadoop 6) launch container as Hadoop & use cgroups managed by the cluster operator Cases 1, 2, 3 and 5 require root privs. Cases 3 and 6 are covered by the patch above. I'm happy to expand the LCE into a "hadoop root tool" which can be used in cases 1, 2, 3, and 5. In my mind, the design question is how to cover all six cases with the most amount of code re-use. Today, we have two important ContainerManager subsystems: the Launcher and the Monitor. Today, reforce enforcement is entirely done within the Monitor. The question is, where should new resource enforcement be done? I think the answer is still "in the Monitor" even though, in some use cases, it needs access to root privs. To get access to those privs, it can call the LCE binary (aka the "hadoop root tool"), just as the java-side of the LCE does today. So, concretely, this is my proposal: recognize the LCE binary as the "hadoop root tool" the LCE will have two new functionalities: 1) sched_setaffinity and 2) creating cgroups in addition to the patch above, I will create 1) another pluggable ContainersMonitor which can use these new functions (sched_setaffinity) and 2) adapt the one above to optionally use the (creating cgroups) functionality of the "hadoop root tool" how does that sound?
        Hide
        Arun C Murthy added a comment -

        Andrew - please don't this the wrong way, I certainly am not trying to debate taskset v/s cgroups. All I'm saying is 'we need both' for the dominant platforms: RHEL5 and RHEL6. I perfectly understand that you might not have the time or the inclination to do both, and I'm happy to help, personally - supporting just RHEL6 isn't enough.

        Given that, we have two options:

        1. Admin-setup cgroups (outside YARN)
        2. YARN handles it on it's own via LCE

        Now the pros of using LCE:

        1. It already exists! Hence it doesn't require any new operational requirements.
        2. It's consistent for both technologies/platforms we need to support: taskset/RHEL5 and cgroups/RHEL6.
        3. Even better, we can use the same for any platform in the future e.g. WindowsContainerExecutor (for e.g. we already have WindowsTaskController in branch-1-win and would need to get ported to branch-2 soon).
        4. It's much lesser overhead on admins - they don't have to create cgroups upfront, they don't have to mount them to get them to survive reboots etc.

        Cons:

        1. Need LCE for non-secure setups. We actually did support LTC without security in branch-1 at some point, happy to discuss.

        In the alternate (admin-setup groups) we will still need LCE (or worse, another setuid script) to support taskset. To me that is a very bad choice.

        As a result, using LCE seems like a significantly superior alternative.


        Some other comments:

        In my mind, the LCE is for starting processes, and should stick to doing that.

        Not true at all, we already use it for container cleanup etc.

        4) For cgroups, we could have a second ContainersMonitor plugin which uses a setuid root binary to also mount & create cgroups, freeing the admin from managing them at all.
        5) For taskset, we can implement a ContainersMonitor which uses a setuid root binary (potentially the LCE, but perhaps better if it's something else, just to keep the security footprint down) to pin processes to CPUs. This ContainersMonitor will also need the memory enforcement code from the current ContainersMonitorImpl

        Like I said above, have two ways to do the same when we can do with one existing component i.e. LCE seems like a clear choice.

        I understand you might not have time to port your work via LCE, I'm happy to either help or take up that work.

        Show
        Arun C Murthy added a comment - Andrew - please don't this the wrong way, I certainly am not trying to debate taskset v/s cgroups. All I'm saying is 'we need both' for the dominant platforms: RHEL5 and RHEL6. I perfectly understand that you might not have the time or the inclination to do both, and I'm happy to help, personally - supporting just RHEL6 isn't enough. Given that, we have two options: Admin-setup cgroups (outside YARN) YARN handles it on it's own via LCE Now the pros of using LCE: It already exists! Hence it doesn't require any new operational requirements. It's consistent for both technologies/platforms we need to support: taskset/RHEL5 and cgroups/RHEL6. Even better, we can use the same for any platform in the future e.g. WindowsContainerExecutor (for e.g. we already have WindowsTaskController in branch-1-win and would need to get ported to branch-2 soon). It's much lesser overhead on admins - they don't have to create cgroups upfront, they don't have to mount them to get them to survive reboots etc. Cons: Need LCE for non-secure setups. We actually did support LTC without security in branch-1 at some point, happy to discuss. In the alternate (admin-setup groups) we will still need LCE (or worse, another setuid script) to support taskset. To me that is a very bad choice. As a result, using LCE seems like a significantly superior alternative. Some other comments: In my mind, the LCE is for starting processes, and should stick to doing that. Not true at all, we already use it for container cleanup etc. 4) For cgroups, we could have a second ContainersMonitor plugin which uses a setuid root binary to also mount & create cgroups, freeing the admin from managing them at all. 5) For taskset, we can implement a ContainersMonitor which uses a setuid root binary (potentially the LCE, but perhaps better if it's something else, just to keep the security footprint down) to pin processes to CPUs. This ContainersMonitor will also need the memory enforcement code from the current ContainersMonitorImpl Like I said above, have two ways to do the same when we can do with one existing component i.e. LCE seems like a clear choice. I understand you might not have time to port your work via LCE, I'm happy to either help or take up that work.
        Hide
        Andrew Ferguson added a comment -

        hi all, I think there are pros and cons to both approaches, which I will try to outline below.

        Cgroups:

        • they provide a coherent path for future resource management: network bandwidth, CPU upper- and lower-bounds, block I/O priorities and limits, etc. [1]
        • can be integrated with resource management for other applications, drawing upon a single resource budget for a group of users
        • cgroup's hierarchies are key to this. in a taskset-only world, the NM would need to be given a fixed allocation of the node's CPUs to manage
        • cgroups are not persistent across reboots. this is unfortunate. however, 1) anyone using them needs to mount them on startup, so they will need to make a change to their startup process already, and 2) there are extensive, cross-distro tools to create and manage cgroups automatically on reboot (RHEL 6 has great docs on them [2])
        • some clusters are already using Cgroups, without any support from Hadoop/YARN. for example, StumbleUpon [3]

        Taskset:

        • compatible with RHEL 5
        • does not require changes to node startup
        • can be implemented with a SUID root binary, as LCE is today

        My first design for this JIRA had the LCE create the cgroups. This turned out to be the wrong approach for several reasons:

        • What if I wanted to use the regular container executor with cgroups? An admin may not allow me to have a setuid root binary, but may be willing to create a cgroup hierarchy for me (after all, this is one advantage of the hierarchy: delegation)
        • Conversely, what if I wanted to use the LCE without cgroups?
        • There needs to be a part of the NM responsible for deleting unused cgroups, and the other tasks of a ContainersManager I described in MAPREDUCE-4351. Some of those are specific to how resource enforcement is being done; it seemed best to keep that code together in the ContainersManager, then spread across a ContainersManager and the LCE.
        • Putting the resource enforcement "smarts" in the ContainersMonitor (which is already receiving events from the RM), allows it to dynamically adjust the resource enforcement
        • On startup, the JVM can appear to be using twice as much memory as it actually is (see comment in ContainersMonitorImpl.java). By starting the JVM within the cgroup, rather than allowing it to start outside the cgroup and moving it into the cgroup with a ContainersMonitor as my patch above does, the kernel may kill the JVM inadvertently.

        I really like the flexibility of keeping the LCE and resource enforcement separate. In my mind, the LCE is for starting processes, and should stick to doing that. Resource enforcement is a separate job.

        My recommendation is the following:
        1) Keep the LCE as it is.
        2) Support pluggable ContainersMonitors (MAPREDUCE-4351)
        3) For cgroups, we can start with the patch above. It is best for admins who already use cgroups on their nodes and want to have YARN take advantage of them. (This is the point of the yarn.nodemanger.cgroups.path config option I added)
        4) For cgroups, we could have a second ContainersMonitor plugin which uses a setuid root binary to also mount & create cgroups, freeing the admin from managing them at all.
        5) For taskset, we can implement a ContainersMonitor which uses a setuid root binary (potentially the LCE, but perhaps better if it's something else, just to keep the security footprint down) to pin processes to CPUs. This ContainersMonitor will also need the memory enforcement code from the current ContainersMonitorImpl

        I've done 1-3 (well, #1 is a freebie ... and I can definitely do #5 as well.

        Arun, does this design appeal to you?

        [1] http://www.linux-kongress.org/2010/slides/seyfried-cgroups-linux-kongress-2010-presentation.pdf
        [2] https://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/index.html
        [3] http://inchoate-clatter.blogspot.com/2012/03/hbase-ops-automation.html

        Show
        Andrew Ferguson added a comment - hi all, I think there are pros and cons to both approaches, which I will try to outline below. Cgroups: they provide a coherent path for future resource management: network bandwidth, CPU upper- and lower-bounds, block I/O priorities and limits, etc. [1] can be integrated with resource management for other applications, drawing upon a single resource budget for a group of users cgroup's hierarchies are key to this. in a taskset-only world, the NM would need to be given a fixed allocation of the node's CPUs to manage cgroups are not persistent across reboots. this is unfortunate. however, 1) anyone using them needs to mount them on startup, so they will need to make a change to their startup process already, and 2) there are extensive, cross-distro tools to create and manage cgroups automatically on reboot (RHEL 6 has great docs on them [2] ) some clusters are already using Cgroups, without any support from Hadoop/YARN. for example, StumbleUpon [3] Taskset: compatible with RHEL 5 does not require changes to node startup can be implemented with a SUID root binary, as LCE is today My first design for this JIRA had the LCE create the cgroups. This turned out to be the wrong approach for several reasons: What if I wanted to use the regular container executor with cgroups? An admin may not allow me to have a setuid root binary, but may be willing to create a cgroup hierarchy for me (after all, this is one advantage of the hierarchy: delegation) Conversely, what if I wanted to use the LCE without cgroups? There needs to be a part of the NM responsible for deleting unused cgroups, and the other tasks of a ContainersManager I described in MAPREDUCE-4351 . Some of those are specific to how resource enforcement is being done; it seemed best to keep that code together in the ContainersManager, then spread across a ContainersManager and the LCE. Putting the resource enforcement "smarts" in the ContainersMonitor (which is already receiving events from the RM), allows it to dynamically adjust the resource enforcement On startup, the JVM can appear to be using twice as much memory as it actually is (see comment in ContainersMonitorImpl.java). By starting the JVM within the cgroup, rather than allowing it to start outside the cgroup and moving it into the cgroup with a ContainersMonitor as my patch above does, the kernel may kill the JVM inadvertently. I really like the flexibility of keeping the LCE and resource enforcement separate. In my mind, the LCE is for starting processes, and should stick to doing that. Resource enforcement is a separate job. My recommendation is the following: 1) Keep the LCE as it is. 2) Support pluggable ContainersMonitors ( MAPREDUCE-4351 ) 3) For cgroups, we can start with the patch above. It is best for admins who already use cgroups on their nodes and want to have YARN take advantage of them. (This is the point of the yarn.nodemanger.cgroups.path config option I added) 4) For cgroups, we could have a second ContainersMonitor plugin which uses a setuid root binary to also mount & create cgroups, freeing the admin from managing them at all. 5) For taskset, we can implement a ContainersMonitor which uses a setuid root binary (potentially the LCE, but perhaps better if it's something else, just to keep the security footprint down) to pin processes to CPUs. This ContainersMonitor will also need the memory enforcement code from the current ContainersMonitorImpl I've done 1-3 (well, #1 is a freebie ... and I can definitely do #5 as well. Arun, does this design appeal to you? [1] http://www.linux-kongress.org/2010/slides/seyfried-cgroups-linux-kongress-2010-presentation.pdf [2] https://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/index.html [3] http://inchoate-clatter.blogspot.com/2012/03/hbase-ops-automation.html
        Hide
        Arun C Murthy added a comment -

        Clearly, we need to support taskset for platforms on which cgroups isn't supported e.g. RHEL5. For taskset you need super-user privs - would you prefer packages to do it too?

        I meant to say: for taskset we clearly need to go via LCE at runtime.

        Show
        Arun C Murthy added a comment - Clearly, we need to support taskset for platforms on which cgroups isn't supported e.g. RHEL5. For taskset you need super-user privs - would you prefer packages to do it too? I meant to say: for taskset we clearly need to go via LCE at runtime.
        Hide
        Arun C Murthy added a comment -

        Also, it does look like cgroups might not be persisted across reboots - just makes it much worse to deal with in that case.

        Show
        Arun C Murthy added a comment - Also, it does look like cgroups might not be persisted across reboots - just makes it much worse to deal with in that case.
        Hide
        Arun C Murthy added a comment -

        How is it onerous? Packages could easily do this as part of the install on platforms where it's supported.

        This doesn't make sense. What if CPU isolation is disabled? Do you still want 'packages' to make it part of the install?

        Clearly, we need to support taskset for platforms on which cgroups isn't supported e.g. RHEL5. For taskset you need super-user privs - would you prefer packages to do it too?

        Yes, LTC is a pain, but using it consistently (e.g. for both cgroups and taskset) seems better than having multiple steps forced on the admin (LCE + cgroups + taskset etc.).

        Show
        Arun C Murthy added a comment - How is it onerous? Packages could easily do this as part of the install on platforms where it's supported. This doesn't make sense. What if CPU isolation is disabled? Do you still want 'packages' to make it part of the install? Clearly, we need to support taskset for platforms on which cgroups isn't supported e.g. RHEL5. For taskset you need super-user privs - would you prefer packages to do it too? Yes, LTC is a pain, but using it consistently (e.g. for both cgroups and taskset) seems better than having multiple steps forced on the admin (LCE + cgroups + taskset etc.).
        Hide
        Todd Lipcon added a comment -

        Preventing such onerous requirements on cluster setup is a key goal - something which initially led to creation of LinuxTaskController etc.

        How is it onerous? Packages could easily do this as part of the install on platforms where it's supported.

        It seems equivalent to the installation of the LTC itself, which requires root to make it setuid, right?

        Andrew: do the cgroups persist cross-reboot, or does that cgcreate command need to go the startup scripts?

        Show
        Todd Lipcon added a comment - Preventing such onerous requirements on cluster setup is a key goal - something which initially led to creation of LinuxTaskController etc. How is it onerous? Packages could easily do this as part of the install on platforms where it's supported. It seems equivalent to the installation of the LTC itself, which requires root to make it setuid, right? Andrew: do the cgroups persist cross-reboot, or does that cgcreate command need to go the startup scripts?
        Hide
        Arun C Murthy added a comment -

        Andrew, thanks, I missed that comment.

        I'm concerned asking admins to setup croups etc. via cgcreate prior to deploying Hadoop clusters etc., particularly on all nodes, is almost a non-starter.

        Preventing such onerous requirements on cluster setup is a key goal - something which initially led to creation of LinuxTaskController etc.

        I'd strongly urge we implement this functionality via LinuxContainerExecutor - there-by allowing us to write low-level platform specific code (RHEL5 v/s RHEL6 etc.) in a single place and not rely on tedious Java code for the same.

        Thoughts?

        Show
        Arun C Murthy added a comment - Andrew, thanks, I missed that comment. I'm concerned asking admins to setup croups etc. via cgcreate prior to deploying Hadoop clusters etc., particularly on all nodes, is almost a non-starter. Preventing such onerous requirements on cluster setup is a key goal - something which initially led to creation of LinuxTaskController etc. I'd strongly urge we implement this functionality via LinuxContainerExecutor - there-by allowing us to write low-level platform specific code (RHEL5 v/s RHEL6 etc.) in a single place and not rely on tedious Java code for the same. Thoughts?
        Hide
        Andrew Ferguson added a comment -

        @Arun: no, the NM does not need superuser privs. in my comment above [1], the line "$ sudo cgcreate -a hadoop_user_name -g memory:hadoop-yarn" is run when installing Hadoop. This creates a branch of the memory hierarchy called "hadoop-yarn" which is owned by the user "hadoop_user_name" (which would be the user running the NM). This allows the NM to create and move cgroups without superuser privs.

        The one complication is only the superuser or the owner of a process may move a process into a cgroup. As the LinuxContainerExecutor runs processes under different user accounts, we will need to either augment it, or use a similar tool to move such processes into a cgroup created by the NM user.

        Let me know if you'd like further clarification.

        [1] https://issues.apache.org/jira/browse/MAPREDUCE-4334?focusedCommentId=13399014&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13399014

        Show
        Andrew Ferguson added a comment - @Arun: no, the NM does not need superuser privs. in my comment above [1] , the line "$ sudo cgcreate -a hadoop_user_name -g memory:hadoop-yarn" is run when installing Hadoop. This creates a branch of the memory hierarchy called "hadoop-yarn" which is owned by the user "hadoop_user_name" (which would be the user running the NM). This allows the NM to create and move cgroups without superuser privs. The one complication is only the superuser or the owner of a process may move a process into a cgroup. As the LinuxContainerExecutor runs processes under different user accounts, we will need to either augment it, or use a similar tool to move such processes into a cgroup created by the NM user. Let me know if you'd like further clarification. [1] https://issues.apache.org/jira/browse/MAPREDUCE-4334?focusedCommentId=13399014&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13399014
        Hide
        Arun C Murthy added a comment -

        Andrew, what are the security implications here? Does the NM need superuser privs to create/move cgroups?

        Show
        Arun C Murthy added a comment - Andrew, what are the security implications here? Does the NM need superuser privs to create/move cgroups?
        Andrew Ferguson made changes -
        Attachment MAPREDUCE-4334-pre3-with_cpu.patch [ 12535763 ]
        Hide
        Andrew Ferguson added a comment -

        Same as previous update, for the version with CPU sharing as well as memory limits. This has been tested with the latest patches to MAPREDUCE-4351 and MAPREDUCE-4327.

        Show
        Andrew Ferguson added a comment - Same as previous update, for the version with CPU sharing as well as memory limits. This has been tested with the latest patches to MAPREDUCE-4351 and MAPREDUCE-4327 .
        Andrew Ferguson made changes -
        Attachment MAPREDUCE-4334-pre3.patch [ 12535762 ]
        Hide
        Andrew Ferguson added a comment -

        This is a small edit to the previous patch. It now writes the process ID to "cgroup.procs" instead of "tasks" so other kernel threads started by the same process stay in the cgroup.

        Show
        Andrew Ferguson added a comment - This is a small edit to the previous patch. It now writes the process ID to "cgroup.procs" instead of "tasks" so other kernel threads started by the same process stay in the cgroup.
        Todd Lipcon made changes -
        Assignee Arun C Murthy [ acmurthy ] Andrew Ferguson [ adferguson ]
        Andrew Ferguson made changes -
        Attachment MAPREDUCE-4334-pre2-with_cpu.patch [ 12533087 ]
        Hide
        Andrew Ferguson added a comment -

        This is an amended version of pre2 which also enforces CPU weights, as set by MAPREDUCE-4327.

        Of course, since applications currently do not request any cores, it happily enforces a limit of 0, preventing tasks from running. Adding reasonable default CPU core requests should probably be part of MAPREDUCE-4327.

        It can be tested using the same procedure as above (for the pre2 version), with the additional step of mounting the cpu cgroup.

        Show
        Andrew Ferguson added a comment - This is an amended version of pre2 which also enforces CPU weights, as set by MAPREDUCE-4327 . Of course, since applications currently do not request any cores, it happily enforces a limit of 0, preventing tasks from running. Adding reasonable default CPU core requests should probably be part of MAPREDUCE-4327 . It can be tested using the same procedure as above (for the pre2 version), with the additional step of mounting the cpu cgroup.
        Andrew Ferguson made changes -
        Attachment MAPREDUCE-4334-pre2.patch [ 12532971 ]
        Hide
        Andrew Ferguson added a comment -

        This version is ready for testing. It has the following requires:

        1) Apply patch in MAPREDUCE-4351. This allows you to set yarn.nodemanager.containers-monitor.class to o.a.h.yarn.server.nodemanager.containermanager.monitor.CgroupsContainersMonitor

        2) Mount the cgroups memory controller at a path of your choosing. For example:

        $ sudo mount -t cgroup -o memory none /cgroups/mem

        The NodeManager will detect where you have mounted the cgroups.

        3) Create a cgroups hierarchy which Hadoop can use. This is most easily done with:

        $ sudo cgcreate -a hadoop_user_name -g memory:hadoop-yarn

        "hadoop-yarn" is the default hierarchy the NodeManager expects; this can be configured with yarn.nodemanager.cgroups.path.

        that's it!

        I have tested that it enforces memory limits, and reacts appropriately when the kernel kills processes, or when they complete successfully. It also notifies the user when cgroups have been mis-configured (for example, if the Hadoop user does not have write access to the cgroup hierarchy).

        Currently, it only enforces memory limits, as per the trunk code. I am planning to augment the patch in MAPREDUCE-4327 to provide CPU limits to the ContainersMonitor. It is easy to extend this patch to any other cgroup controller.

        thank you,
        Andrew

        Show
        Andrew Ferguson added a comment - This version is ready for testing. It has the following requires: 1) Apply patch in MAPREDUCE-4351 . This allows you to set yarn.nodemanager.containers-monitor.class to o.a.h.yarn.server.nodemanager.containermanager.monitor.CgroupsContainersMonitor 2) Mount the cgroups memory controller at a path of your choosing. For example: $ sudo mount -t cgroup -o memory none /cgroups/mem The NodeManager will detect where you have mounted the cgroups. 3) Create a cgroups hierarchy which Hadoop can use. This is most easily done with: $ sudo cgcreate -a hadoop_user_name -g memory:hadoop-yarn "hadoop-yarn" is the default hierarchy the NodeManager expects; this can be configured with yarn.nodemanager.cgroups.path. that's it! I have tested that it enforces memory limits, and reacts appropriately when the kernel kills processes, or when they complete successfully. It also notifies the user when cgroups have been mis-configured (for example, if the Hadoop user does not have write access to the cgroup hierarchy). Currently, it only enforces memory limits, as per the trunk code. I am planning to augment the patch in MAPREDUCE-4327 to provide CPU limits to the ContainersMonitor. It is easy to extend this patch to any other cgroup controller. thank you, Andrew
        Andrew Ferguson made changes -
        Attachment MAPREDUCE-4334-pre1.patch [ 12532941 ]
        Hide
        Andrew Ferguson added a comment -

        This is a preliminary patch to add support for using cgroups to do resource isolation and enforcement. It requires MAPREDUCE-4351, which provides pluggable ContainersMonitors.

        This patch currently assumes that the memory cgroups controller is mounted at "/cgroups/mem" and that a "/cgroups/mem/hadoop-yarn" group exists which is writable by the Hadoop user (this is configurable by yarn-site.xml). I will fix these assumptions shortly, but wanted to get the preliminary patch out for discussion.

        thanks!

        Show
        Andrew Ferguson added a comment - This is a preliminary patch to add support for using cgroups to do resource isolation and enforcement. It requires MAPREDUCE-4351 , which provides pluggable ContainersMonitors. This patch currently assumes that the memory cgroups controller is mounted at "/cgroups/mem" and that a "/cgroups/mem/hadoop-yarn" group exists which is writable by the Hadoop user (this is configurable by yarn-site.xml). I will fix these assumptions shortly, but wanted to get the preliminary patch out for discussion. thanks!
        Robert Joseph Evans made changes -
        Field Original Value New Value
        Link This issue is related to MAPREDUCE-4256 [ MAPREDUCE-4256 ]
        Hide
        Andrew Ferguson added a comment -

        ok, putting all of this in the ContainerExecutor is not the way to go, as it precludes use of secure Hadoop's Linux container-executor.

        In my new design, ContainerMonitor will be a pluggable component, just as ContainerExecutor is now. Then, we can provide a ContainerMonitor which uses cgroups to control resource usage, rather than the existing ContainerMonitor (to be renamed as "DefaultContainerMonitor"). This has several advantages:
        1) allows us to keep existing ContainerMonitor for users who can't use cgroups (eg, users without root access during Hadoop setup)
        2) ContainerMonitor already receives an event when it's time to stop monitoring, which we can use as notification to delete the container's cgroup
        3) ContainerMonitor receives the resource limits already; no need to calculate them based on the configs
        4) A pluggable ContainerMonitor paves the way for ContainerMonitors on other platforms

        I will first open a sub-task to make ContainerMonitor pluggable.

        The only trouble spot with this design is that it's not possible to move another non-root user's process into a cgroup. I plan to extend the secure container-executor to be able to make such a move.

        Please let me know if you have any feedback about this proposal.

        thank you,
        Andrew

        Show
        Andrew Ferguson added a comment - ok, putting all of this in the ContainerExecutor is not the way to go, as it precludes use of secure Hadoop's Linux container-executor. In my new design, ContainerMonitor will be a pluggable component, just as ContainerExecutor is now. Then, we can provide a ContainerMonitor which uses cgroups to control resource usage, rather than the existing ContainerMonitor (to be renamed as "DefaultContainerMonitor"). This has several advantages: 1) allows us to keep existing ContainerMonitor for users who can't use cgroups (eg, users without root access during Hadoop setup) 2) ContainerMonitor already receives an event when it's time to stop monitoring, which we can use as notification to delete the container's cgroup 3) ContainerMonitor receives the resource limits already; no need to calculate them based on the configs 4) A pluggable ContainerMonitor paves the way for ContainerMonitors on other platforms I will first open a sub-task to make ContainerMonitor pluggable. The only trouble spot with this design is that it's not possible to move another non-root user's process into a cgroup. I plan to extend the secure container-executor to be able to make such a move. Please let me know if you have any feedback about this proposal. thank you, Andrew
        Hide
        Andrew Ferguson added a comment -

        Hi Arun,

        I've thought some more about implementing taskset since our chat at the YARN meet-up.

        One benefit of cgroups is they're "set it and forget it" – in the ContainerExecutor, we simply place the new task in the appropriate cgroup, and the kernel will take care of the rest. This would allow us to ditch the ContainersMonitor infrastructure.

        On the other hand, with taskset, we will need to do the CPU scheduling ourselves. Say I have two cores and start with two processes, A (requested 0.5 cores) and B (requested 0.5 cores). I can start by putting them both on core 1 for efficiency, or I can put them on separate cores for higher utilization. But if process C (requested 1 core) comes along, I will need to set A & B to the same core. This is just a simple scenario, but more cores and processes will likely grow a complicated CPU scheduler inside the NodeManager (ContainersMonitorImpl is probably the right place, since it is already monitoring container resource usage).

        tl;dr – I believe cgroups requires only local state when launching containers, while taskset requires us to maintain global state.

        thoughts?

        thanks!
        Andrew

        Show
        Andrew Ferguson added a comment - Hi Arun, I've thought some more about implementing taskset since our chat at the YARN meet-up. One benefit of cgroups is they're "set it and forget it" – in the ContainerExecutor, we simply place the new task in the appropriate cgroup, and the kernel will take care of the rest. This would allow us to ditch the ContainersMonitor infrastructure. On the other hand, with taskset, we will need to do the CPU scheduling ourselves. Say I have two cores and start with two processes, A (requested 0.5 cores) and B (requested 0.5 cores). I can start by putting them both on core 1 for efficiency, or I can put them on separate cores for higher utilization. But if process C (requested 1 core) comes along, I will need to set A & B to the same core. This is just a simple scenario, but more cores and processes will likely grow a complicated CPU scheduler inside the NodeManager (ContainersMonitorImpl is probably the right place, since it is already monitoring container resource usage). tl;dr – I believe cgroups requires only local state when launching containers, while taskset requires us to maintain global state. thoughts? thanks! Andrew
        Hide
        Andrew Ferguson added a comment -

        hi Arun,

        I've actually been looking into this recently myself, and would be happy to take the lead on it. So far, I've been focusing on cgroups as they also provide memory containment, and provide a path for managing future resources as well. although taskset is available on RHEL5, it's not capable of isolating fractions of a CPU.

        while cgroups' memory support gives an upper-bound on the amount of memory tasks can consume, the RHEL6 cpu support is actually a lower-bound. until CFS bandwidth control [1] is more widespread, we can place tasks judiciously to create guarantees, building on cgroups to ensure the lower-bounds.

        best,
        Andrew

        [1] for a quick overview: http://lwn.net/Articles/428230/ ... more in-depth discussion here: http://www.kernel.org/doc/ols/2010/ols2010-pages-245-254.pdf

        Show
        Andrew Ferguson added a comment - hi Arun, I've actually been looking into this recently myself, and would be happy to take the lead on it. So far, I've been focusing on cgroups as they also provide memory containment, and provide a path for managing future resources as well. although taskset is available on RHEL5, it's not capable of isolating fractions of a CPU. while cgroups' memory support gives an upper-bound on the amount of memory tasks can consume, the RHEL6 cpu support is actually a lower-bound. until CFS bandwidth control [1] is more widespread, we can place tasks judiciously to create guarantees, building on cgroups to ensure the lower-bounds. best, Andrew [1] for a quick overview: http://lwn.net/Articles/428230/ ... more in-depth discussion here: http://www.kernel.org/doc/ols/2010/ols2010-pages-245-254.pdf
        Arun C Murthy created issue -

          People

          • Assignee:
            Andrew Ferguson
            Reporter:
            Arun C Murthy
          • Votes:
            0 Vote for this issue
            Watchers:
            41 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development