Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.0
    • Fix Version/s: 2.9.0, 3.0.0-alpha1
    • Component/s: nodemanager
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    1. YARN-1856.001.patch
      20 kB
      Varun Vasudev
    2. YARN-1856.002.patch
      23 kB
      Varun Vasudev
    3. YARN-1856.003.patch
      23 kB
      Varun Vasudev
    4. YARN-1856.004.patch
      24 kB
      Varun Vasudev

      Issue Links

        Activity

        Hide
        vinodkv Vinod Kumar Vavilapalli added a comment -

        Duplicate of YARN-3?

        Show
        vinodkv Vinod Kumar Vavilapalli added a comment - Duplicate of YARN-3 ?
        Hide
        kkambatl Karthik Kambatla (Inactive) added a comment -

        Nope. YARN-3, IIUC, is just for CPU. Also, we don't want to enforce memory through cgroups - this is just for monitoring.

        Show
        kkambatl Karthik Kambatla (Inactive) added a comment - Nope. YARN-3 , IIUC, is just for CPU. Also, we don't want to enforce memory through cgroups - this is just for monitoring.
        Hide
        vinodkv Vinod Kumar Vavilapalli added a comment -

        When we use cgroups, we don't need (and want) explicit monitoring. Cgroups are going to constrain memory usage of the process (and the tree) if the right values are set when creating the group. There were some discussions on YARN-3 and related JIRAs related to this. In essence, the ContainersMonitor is really a monitor to be used only when such a OS feature is not available to properly constrain memory-usage.

        Show
        vinodkv Vinod Kumar Vavilapalli added a comment - When we use cgroups, we don't need (and want) explicit monitoring. Cgroups are going to constrain memory usage of the process (and the tree) if the right values are set when creating the group. There were some discussions on YARN-3 and related JIRAs related to this. In essence, the ContainersMonitor is really a monitor to be used only when such a OS feature is not available to properly constrain memory-usage.
        Hide
        kkambatl Karthik Kambatla (Inactive) added a comment -

        As discussed on YARN-3, using cgroups for memory isolation/enforcement can be problematic as it enforces an upper-bound on the amount of memory tasks can consume and hence doesn't tolerate any momentary spikes.

        Using it for monitoring, however, would help address YARN-1747.

        I haven't yet looked at the cgroups-related source closely enough. Can post an update once I do that.

        Show
        kkambatl Karthik Kambatla (Inactive) added a comment - As discussed on YARN-3 , using cgroups for memory isolation/enforcement can be problematic as it enforces an upper-bound on the amount of memory tasks can consume and hence doesn't tolerate any momentary spikes. Using it for monitoring, however, would help address YARN-1747 . I haven't yet looked at the cgroups-related source closely enough. Can post an update once I do that.
        Hide
        kkambatl Karthik Kambatla (Inactive) added a comment -

        When we use cgroups, we don't need (and want) explicit monitoring.

        If we set the limits much higher than what we want to enforce, we can use them for monitoring instead. The goal, again, is not to enforce.

        Show
        kkambatl Karthik Kambatla (Inactive) added a comment - When we use cgroups, we don't need (and want) explicit monitoring. If we set the limits much higher than what we want to enforce, we can use them for monitoring instead. The goal, again, is not to enforce.
        Hide
        beckham007 Beckham007 added a comment -

        We had work on this for a few days. We will validate it in our production envriment, which has 4000 nodes.
        We set memory.limit_in_bytes for /cgroup/memory/hadoop-yarn and set memory.soft_limit_in_byte for each container. Also, we use cgroup.event_control to handle oom event.
        Mesos used the similar policy for memory isolation.

        Show
        beckham007 Beckham007 added a comment - We had work on this for a few days. We will validate it in our production envriment, which has 4000 nodes. We set memory.limit_in_bytes for /cgroup/memory/hadoop-yarn and set memory.soft_limit_in_byte for each container. Also, we use cgroup.event_control to handle oom event. Mesos used the similar policy for memory isolation.
        Hide
        kasha Karthik Kambatla added a comment -

        I haven't had a chance to work on this further. Beckham007 - how did your testing go? Please feel free to take this JIRA over if you want to contribute what you guys have done.

        Show
        kasha Karthik Kambatla added a comment - I haven't had a chance to work on this further. Beckham007 - how did your testing go? Please feel free to take this JIRA over if you want to contribute what you guys have done.
        Hide
        vinodkv Vinod Kumar Vavilapalli added a comment -

        Moving all tickets targeted for the already closed release 2.6.0 into 2.6.1/2.7.2.

        Show
        vinodkv Vinod Kumar Vavilapalli added a comment - Moving all tickets targeted for the already closed release 2.6.0 into 2.6.1/2.7.2.
        Hide
        vvasudev Varun Vasudev added a comment -

        Assigning to myself to take a crack at it.

        Show
        vvasudev Varun Vasudev added a comment - Assigning to myself to take a crack at it.
        Hide
        sjlee0 Sangjin Lee added a comment -

        Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me know.

        Show
        sjlee0 Sangjin Lee added a comment - Should this be targeted to 2.6.2? We're trying to release 2.6.1 soon. Let me know.
        Hide
        sjlee0 Sangjin Lee added a comment -

        Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it to 2.6.2. Let me know if you have comments. Thanks!

        Show
        sjlee0 Sangjin Lee added a comment - Unless the patch is ready to go and the JIRA is a critical fix, we'll defer it to 2.6.2. Let me know if you have comments. Thanks!
        Hide
        sjlee0 Sangjin Lee added a comment -

        Targeting 2.6.3 now that 2.6.2 has shipped.

        Show
        sjlee0 Sangjin Lee added a comment - Targeting 2.6.3 now that 2.6.2 has shipped.
        Hide
        vinodkv Vinod Kumar Vavilapalli added a comment -

        Moving out all non-critical / non-blocker issues that didn't make it out of 2.7.2 into 2.7.3.

        Show
        vinodkv Vinod Kumar Vavilapalli added a comment - Moving out all non-critical / non-blocker issues that didn't make it out of 2.7.2 into 2.7.3.
        Hide
        djp Junping Du added a comment -

        Hi, can we move this out of 2.6.3? Thanks!

        Show
        djp Junping Du added a comment - Hi, can we move this out of 2.6.3? Thanks!
        Hide
        vvasudev Varun Vasudev added a comment -

        Moved it out of 2.6.3 and 2.7.3.

        Show
        vvasudev Varun Vasudev added a comment - Moved it out of 2.6.3 and 2.7.3.
        Hide
        vvasudev Varun Vasudev added a comment -

        Uploaded a patch that adds support for cgroups based memory monitoring. I found that the default setting for swappiness results in a significant change in behaviour compared to the existing pmem monitor. I've added a configuration to let admins set the swappiness value, with the default being 0.

        Show
        vvasudev Varun Vasudev added a comment - Uploaded a patch that adds support for cgroups based memory monitoring. I found that the default setting for swappiness results in a significant change in behaviour compared to the existing pmem monitor. I've added a configuration to let admins set the swappiness value, with the default being 0.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 0s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
        +1 mvninstall 9m 16s trunk passed
        +1 compile 2m 30s trunk passed with JDK v1.8.0_66
        +1 compile 2m 35s trunk passed with JDK v1.7.0_85
        +1 checkstyle 0m 32s trunk passed
        +1 mvnsite 1m 8s trunk passed
        +1 mvneclipse 0m 29s trunk passed
        +1 findbugs 2m 44s trunk passed
        +1 javadoc 1m 8s trunk passed with JDK v1.8.0_66
        +1 javadoc 3m 23s trunk passed with JDK v1.7.0_85
        +1 mvninstall 1m 2s the patch passed
        +1 compile 2m 31s the patch passed with JDK v1.8.0_66
        +1 javac 2m 31s the patch passed
        +1 compile 2m 35s the patch passed with JDK v1.7.0_85
        +1 javac 2m 35s the patch passed
        -1 checkstyle 0m 33s Patch generated 8 new checkstyle issues in hadoop-yarn-project/hadoop-yarn (total was 238, now 245).
        +1 mvnsite 1m 7s the patch passed
        +1 mvneclipse 0m 29s the patch passed
        +1 whitespace 0m 0s Patch has no whitespace issues.
        +1 findbugs 3m 2s the patch passed
        +1 javadoc 1m 9s the patch passed with JDK v1.8.0_66
        +1 javadoc 3m 33s the patch passed with JDK v1.7.0_85
        +1 unit 0m 29s hadoop-yarn-api in the patch passed with JDK v1.8.0_66.
        +1 unit 9m 36s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66.
        +1 unit 0m 28s hadoop-yarn-api in the patch passed with JDK v1.7.0_85.
        +1 unit 9m 46s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_85.
        +1 asflicense 0m 26s Patch does not generate ASF License warnings.
        62m 1s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:0ca8df7
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12774801/YARN-1856.001.patch
        JIRA Issue YARN-1856
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux c98206b940c3 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / c37c3f4
        findbugs v3.0.0
        checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/9811/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt
        JDK v1.7.0_85 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/9811/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn
        Max memory used 76MB
        Powered by Apache Yetus http://yetus.apache.org
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/9811/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 9m 16s trunk passed +1 compile 2m 30s trunk passed with JDK v1.8.0_66 +1 compile 2m 35s trunk passed with JDK v1.7.0_85 +1 checkstyle 0m 32s trunk passed +1 mvnsite 1m 8s trunk passed +1 mvneclipse 0m 29s trunk passed +1 findbugs 2m 44s trunk passed +1 javadoc 1m 8s trunk passed with JDK v1.8.0_66 +1 javadoc 3m 23s trunk passed with JDK v1.7.0_85 +1 mvninstall 1m 2s the patch passed +1 compile 2m 31s the patch passed with JDK v1.8.0_66 +1 javac 2m 31s the patch passed +1 compile 2m 35s the patch passed with JDK v1.7.0_85 +1 javac 2m 35s the patch passed -1 checkstyle 0m 33s Patch generated 8 new checkstyle issues in hadoop-yarn-project/hadoop-yarn (total was 238, now 245). +1 mvnsite 1m 7s the patch passed +1 mvneclipse 0m 29s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 3m 2s the patch passed +1 javadoc 1m 9s the patch passed with JDK v1.8.0_66 +1 javadoc 3m 33s the patch passed with JDK v1.7.0_85 +1 unit 0m 29s hadoop-yarn-api in the patch passed with JDK v1.8.0_66. +1 unit 9m 36s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66. +1 unit 0m 28s hadoop-yarn-api in the patch passed with JDK v1.7.0_85. +1 unit 9m 46s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_85. +1 asflicense 0m 26s Patch does not generate ASF License warnings. 62m 1s Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12774801/YARN-1856.001.patch JIRA Issue YARN-1856 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux c98206b940c3 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / c37c3f4 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/9811/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt JDK v1.7.0_85 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/9811/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn Max memory used 76MB Powered by Apache Yetus http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-YARN-Build/9811/console This message was automatically generated.
        Hide
        sidharta-s Sidharta Seethana added a comment -

        Hi Varun Vasudev,

        Thanks for the patch. Please see below for some comments.

        @VisibleForTesting
        static final String HARD_LIMIT = "limit_in_bytes";
        @VisibleForTesting
        static final String SOFT_LIMIT = "soft_limit_in_bytes";
        @VisibleForTesting
        static final String SWAPPINESS = "swappiness”;
        

        Could you move these constants to CGroupsHandler ? There are already some cgroups parameter constants defined there.

        if (configuration.getBoolean(YarnConfiguration.NM_PMEM_CHECK_ENABLED,
            YarnConfiguration.DEFAULT_NM_PMEM_CHECK_ENABLED)) {
          LOG.warn("You have enabled the default YARN physical memory health"
              + " checker as well as the CGroups memory controller. This could"
              + " lead to unpredictable behaviour");
        }
        

        IMO, If the behavior here is unpredictable, we should simply error our here in case both are enabled.

        long softLimit = (long) (container.getResource().getMemory() * 0.90f);
        

        We should make the fraction configurable, I think. What are the implications of the soft limit?

        public static final String NM_MEMORY_RESOURCE_PREFIX = NM_PREFIX
            + "resource.memory.";
        
        public static final String NM_MEMORY_RESOURCE_ENABLED =
            NM_MEMORY_RESOURCE_PREFIX + "enabled";
        public static final boolean DEFAULT_NM_MEMORY_RESOURCE_ENABLED = false;
        
        public static final String NM_MEMORY_RESOURCE_CGROUPS_SWAPPINESS =
            NM_MEMORY_RESOURCE_PREFIX + "cgroups.swappiness";
        public static final int DEFAULT_NM_MEMORY_RESOURCE_CGROUPS_SWAPPINESS = 0;
        

        Since we are skipping changes to yarn-default.xml (based on changes I see in TestYarnConfigurationFields), these should be marked @Private , similar to how network/disk configs settings are annotated?

        Thinking aloud here : should we add support in some form for memory.oom_control and notifications/stats?

        Show
        sidharta-s Sidharta Seethana added a comment - Hi Varun Vasudev , Thanks for the patch. Please see below for some comments. @VisibleForTesting static final String HARD_LIMIT = "limit_in_bytes" ; @VisibleForTesting static final String SOFT_LIMIT = "soft_limit_in_bytes" ; @VisibleForTesting static final String SWAPPINESS = "swappiness”; Could you move these constants to CGroupsHandler ? There are already some cgroups parameter constants defined there. if (configuration.getBoolean(YarnConfiguration.NM_PMEM_CHECK_ENABLED, YarnConfiguration.DEFAULT_NM_PMEM_CHECK_ENABLED)) { LOG.warn( "You have enabled the default YARN physical memory health" + " checker as well as the CGroups memory controller. This could" + " lead to unpredictable behaviour" ); } IMO, If the behavior here is unpredictable, we should simply error our here in case both are enabled. long softLimit = ( long ) (container.getResource().getMemory() * 0.90f); We should make the fraction configurable, I think. What are the implications of the soft limit? public static final String NM_MEMORY_RESOURCE_PREFIX = NM_PREFIX + "resource.memory." ; public static final String NM_MEMORY_RESOURCE_ENABLED = NM_MEMORY_RESOURCE_PREFIX + "enabled" ; public static final boolean DEFAULT_NM_MEMORY_RESOURCE_ENABLED = false ; public static final String NM_MEMORY_RESOURCE_CGROUPS_SWAPPINESS = NM_MEMORY_RESOURCE_PREFIX + "cgroups.swappiness" ; public static final int DEFAULT_NM_MEMORY_RESOURCE_CGROUPS_SWAPPINESS = 0; Since we are skipping changes to yarn-default.xml (based on changes I see in TestYarnConfigurationFields), these should be marked @Private , similar to how network/disk configs settings are annotated? Thinking aloud here : should we add support in some form for memory.oom_control and notifications/stats?
        Hide
        vvasudev Varun Vasudev added a comment -

        Could you move these constants to CGroupsHandler ? There are already some cgroups parameter constants defined there.

        Fixed.

        IMO, If the behavior here is unpredictable, we should simply error our here in case both are enabled.

        Fixed.

        We should make the fraction configurable, I think. What are the implications of the soft limit?

        Fixed. It's a lower limit useful when there's memory contention.

        Since we are skipping changes to yarn-default.xml (based on changes I see in TestYarnConfigurationFields), these should be marked @Private , similar to how network/disk configs settings are annotated?

        Fixed.

        Thinking aloud here : should we add support in some form for memory.oom_control and notifications/stats?

        We should take that up as a separate JIRA.

        Show
        vvasudev Varun Vasudev added a comment - Could you move these constants to CGroupsHandler ? There are already some cgroups parameter constants defined there. Fixed. IMO, If the behavior here is unpredictable, we should simply error our here in case both are enabled. Fixed. We should make the fraction configurable, I think. What are the implications of the soft limit? Fixed. It's a lower limit useful when there's memory contention. Since we are skipping changes to yarn-default.xml (based on changes I see in TestYarnConfigurationFields), these should be marked @Private , similar to how network/disk configs settings are annotated? Fixed. Thinking aloud here : should we add support in some form for memory.oom_control and notifications/stats? We should take that up as a separate JIRA.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 0s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
        +1 mvninstall 8m 21s trunk passed
        +1 compile 2m 7s trunk passed with JDK v1.8.0_66
        +1 compile 2m 19s trunk passed with JDK v1.7.0_85
        +1 checkstyle 0m 28s trunk passed
        +1 mvnsite 1m 0s trunk passed
        +1 mvneclipse 0m 27s trunk passed
        +1 findbugs 2m 27s trunk passed
        +1 javadoc 1m 2s trunk passed with JDK v1.8.0_66
        +1 javadoc 3m 15s trunk passed with JDK v1.7.0_85
        +1 mvninstall 0m 56s the patch passed
        +1 compile 2m 3s the patch passed with JDK v1.8.0_66
        +1 javac 2m 3s the patch passed
        +1 compile 2m 19s the patch passed with JDK v1.7.0_85
        +1 javac 2m 19s the patch passed
        -1 checkstyle 0m 29s Patch generated 7 new checkstyle issues in hadoop-yarn-project/hadoop-yarn (total was 240, now 246).
        +1 mvnsite 1m 1s the patch passed
        +1 mvneclipse 0m 25s the patch passed
        +1 whitespace 0m 0s Patch has no whitespace issues.
        +1 findbugs 2m 42s the patch passed
        +1 javadoc 0m 58s the patch passed with JDK v1.8.0_66
        +1 javadoc 3m 18s the patch passed with JDK v1.7.0_85
        +1 unit 0m 25s hadoop-yarn-api in the patch passed with JDK v1.8.0_66.
        +1 unit 9m 6s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66.
        +1 unit 0m 26s hadoop-yarn-api in the patch passed with JDK v1.7.0_85.
        +1 unit 9m 35s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_85.
        +1 asflicense 0m 23s Patch does not generate ASF License warnings.
        56m 58s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:0ca8df7
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12776140/YARN-1856.002.patch
        JIRA Issue YARN-1856
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 51136a239c06 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 01a641b
        findbugs v3.0.0
        checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/9888/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt
        JDK v1.7.0_85 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/9888/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn
        Max memory used 76MB
        Powered by Apache Yetus http://yetus.apache.org
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/9888/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 8m 21s trunk passed +1 compile 2m 7s trunk passed with JDK v1.8.0_66 +1 compile 2m 19s trunk passed with JDK v1.7.0_85 +1 checkstyle 0m 28s trunk passed +1 mvnsite 1m 0s trunk passed +1 mvneclipse 0m 27s trunk passed +1 findbugs 2m 27s trunk passed +1 javadoc 1m 2s trunk passed with JDK v1.8.0_66 +1 javadoc 3m 15s trunk passed with JDK v1.7.0_85 +1 mvninstall 0m 56s the patch passed +1 compile 2m 3s the patch passed with JDK v1.8.0_66 +1 javac 2m 3s the patch passed +1 compile 2m 19s the patch passed with JDK v1.7.0_85 +1 javac 2m 19s the patch passed -1 checkstyle 0m 29s Patch generated 7 new checkstyle issues in hadoop-yarn-project/hadoop-yarn (total was 240, now 246). +1 mvnsite 1m 1s the patch passed +1 mvneclipse 0m 25s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 42s the patch passed +1 javadoc 0m 58s the patch passed with JDK v1.8.0_66 +1 javadoc 3m 18s the patch passed with JDK v1.7.0_85 +1 unit 0m 25s hadoop-yarn-api in the patch passed with JDK v1.8.0_66. +1 unit 9m 6s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66. +1 unit 0m 26s hadoop-yarn-api in the patch passed with JDK v1.7.0_85. +1 unit 9m 35s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_85. +1 asflicense 0m 23s Patch does not generate ASF License warnings. 56m 58s Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12776140/YARN-1856.002.patch JIRA Issue YARN-1856 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 51136a239c06 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 01a641b findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/9888/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt JDK v1.7.0_85 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/9888/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn Max memory used 76MB Powered by Apache Yetus http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-YARN-Build/9888/console This message was automatically generated.
        Hide
        sidharta-s Sidharta Seethana added a comment -

        Hi Varun Vasudev, some comments below on the latest version of the patch :

        String CGROUP_PARAM_HARD_LIMIT = "limit_in_bytes";
        String CGROUP_PARAM_SOFT_LIMIT = "soft_limit_in_bytes";
        String CGROUP_PARAM_SWAPPINESS = "swappiness";
        

        The constants should have ‘MEMORY’ in their names. For example, CGROUP_PARAM_HARD_LIMIT is better named as CGROUP_PARAM_MEMORY_HARD_LIMIT in order to avoid future collisions. This is similar to how BLKIO is used in the previous line (classid should be fixed at some point too)

        @Private
        public static final String NM_MEMORY_RESOURCE_CGROUPS_SOFT_LIMIT_PERC =
            NM_MEMORY_RESOURCE_PREFIX + "cgroups.soft-limit-percentage";
        @Private
        public static final float DEFAULT_NM_MEMORY_RESOURCE_CGROUPS_SOFT_LIMIT_PERC =
            0.9f;
        
        softLimitPerc = conf.getFloat(
            YarnConfiguration.NM_MEMORY_RESOURCE_CGROUPS_SOFT_LIMIT_PERC,
            YarnConfiguration.DEFAULT_NM_MEMORY_RESOURCE_CGROUPS_SOFT_LIMIT_PERC);
        if (softLimitPerc < 0.0f || softLimitPerc > 100.0f) {
          throw new ResourceHandlerException(
              "Illegal value '" + softLimitPerc + "' "
                  + YarnConfiguration.NM_MEMORY_RESOURCE_CGROUPS_SOFT_LIMIT_PERC
                  + ". Value must be between 0 and 100.");
        }
        

        Is the soft limit conf setting meant to represent a percentage or is it a fraction between 0 and 1? From the default value of 0.9f and the application of the soft limit it appears to be a fraction, but the name of the setting and its validation check seem to indicate that it is meant to be a percentage. This needs to be fixed.

        Show
        sidharta-s Sidharta Seethana added a comment - Hi Varun Vasudev , some comments below on the latest version of the patch : String CGROUP_PARAM_HARD_LIMIT = "limit_in_bytes" ; String CGROUP_PARAM_SOFT_LIMIT = "soft_limit_in_bytes" ; String CGROUP_PARAM_SWAPPINESS = "swappiness" ; The constants should have ‘MEMORY’ in their names. For example, CGROUP_PARAM_HARD_LIMIT is better named as CGROUP_PARAM_MEMORY_HARD_LIMIT in order to avoid future collisions. This is similar to how BLKIO is used in the previous line (classid should be fixed at some point too) @Private public static final String NM_MEMORY_RESOURCE_CGROUPS_SOFT_LIMIT_PERC = NM_MEMORY_RESOURCE_PREFIX + "cgroups.soft-limit-percentage" ; @Private public static final float DEFAULT_NM_MEMORY_RESOURCE_CGROUPS_SOFT_LIMIT_PERC = 0.9f; softLimitPerc = conf.getFloat( YarnConfiguration.NM_MEMORY_RESOURCE_CGROUPS_SOFT_LIMIT_PERC, YarnConfiguration.DEFAULT_NM_MEMORY_RESOURCE_CGROUPS_SOFT_LIMIT_PERC); if (softLimitPerc < 0.0f || softLimitPerc > 100.0f) { throw new ResourceHandlerException( "Illegal value '" + softLimitPerc + "' " + YarnConfiguration.NM_MEMORY_RESOURCE_CGROUPS_SOFT_LIMIT_PERC + ". Value must be between 0 and 100." ); } Is the soft limit conf setting meant to represent a percentage or is it a fraction between 0 and 1? From the default value of 0.9f and the application of the soft limit it appears to be a fraction, but the name of the setting and its validation check seem to indicate that it is meant to be a percentage. This needs to be fixed.
        Hide
        vvasudev Varun Vasudev added a comment -

        The constants should have ‘MEMORY’ in their names. For example, CGROUP_PARAM_HARD_LIMIT is better named as CGROUP_PARAM_MEMORY_HARD_LIMIT in order to avoid future collisions. This is similar to how BLKIO is used in the previous line (classid should be fixed at some point too)

        Fixed.

        Is the soft limit conf setting meant to represent a percentage or is it a fraction between 0 and 1? From the default value of 0.9f and the application of the soft limit it appears to be a fraction, but the name of the setting and its validation check seem to indicate that it is meant to be a percentage. This needs to be fixed.

        Good catch. It's meant to be a percentage. Fixed.

        Show
        vvasudev Varun Vasudev added a comment - The constants should have ‘MEMORY’ in their names. For example, CGROUP_PARAM_HARD_LIMIT is better named as CGROUP_PARAM_MEMORY_HARD_LIMIT in order to avoid future collisions. This is similar to how BLKIO is used in the previous line (classid should be fixed at some point too) Fixed. Is the soft limit conf setting meant to represent a percentage or is it a fraction between 0 and 1? From the default value of 0.9f and the application of the soft limit it appears to be a fraction, but the name of the setting and its validation check seem to indicate that it is meant to be a percentage. This needs to be fixed. Good catch. It's meant to be a percentage. Fixed.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 0s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
        +1 mvninstall 8m 50s trunk passed
        +1 compile 2m 21s trunk passed with JDK v1.8.0_66
        +1 compile 2m 27s trunk passed with JDK v1.7.0_85
        +1 checkstyle 0m 32s trunk passed
        +1 mvnsite 1m 4s trunk passed
        +1 mvneclipse 0m 28s trunk passed
        +1 findbugs 2m 35s trunk passed
        +1 javadoc 1m 6s trunk passed with JDK v1.8.0_66
        +1 javadoc 3m 23s trunk passed with JDK v1.7.0_85
        +1 mvninstall 1m 0s the patch passed
        +1 compile 2m 22s the patch passed with JDK v1.8.0_66
        +1 javac 2m 22s the patch passed
        +1 compile 2m 29s the patch passed with JDK v1.7.0_85
        +1 javac 2m 29s the patch passed
        -1 checkstyle 0m 31s Patch generated 8 new checkstyle issues in hadoop-yarn-project/hadoop-yarn (total was 240, now 247).
        +1 mvnsite 1m 4s the patch passed
        +1 mvneclipse 0m 27s the patch passed
        +1 whitespace 0m 0s Patch has no whitespace issues.
        +1 findbugs 2m 55s the patch passed
        +1 javadoc 1m 7s the patch passed with JDK v1.8.0_66
        +1 javadoc 3m 19s the patch passed with JDK v1.7.0_85
        +1 unit 0m 28s hadoop-yarn-api in the patch passed with JDK v1.8.0_66.
        +1 unit 9m 14s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66.
        +1 unit 0m 28s hadoop-yarn-api in the patch passed with JDK v1.7.0_85.
        +1 unit 9m 31s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_85.
        -1 asflicense 0m 21s Patch generated 3 ASF License warnings.
        59m 27s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:0ca8df7
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12776258/YARN-1856.003.patch
        JIRA Issue YARN-1856
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 3e83da93fdd3 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / fc47084
        findbugs v3.0.0
        checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/9897/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt
        JDK v1.7.0_85 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/9897/testReport/
        asflicense https://builds.apache.org/job/PreCommit-YARN-Build/9897/artifact/patchprocess/patch-asflicense-problems.txt
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn
        Max memory used 76MB
        Powered by Apache Yetus http://yetus.apache.org
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/9897/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 8m 50s trunk passed +1 compile 2m 21s trunk passed with JDK v1.8.0_66 +1 compile 2m 27s trunk passed with JDK v1.7.0_85 +1 checkstyle 0m 32s trunk passed +1 mvnsite 1m 4s trunk passed +1 mvneclipse 0m 28s trunk passed +1 findbugs 2m 35s trunk passed +1 javadoc 1m 6s trunk passed with JDK v1.8.0_66 +1 javadoc 3m 23s trunk passed with JDK v1.7.0_85 +1 mvninstall 1m 0s the patch passed +1 compile 2m 22s the patch passed with JDK v1.8.0_66 +1 javac 2m 22s the patch passed +1 compile 2m 29s the patch passed with JDK v1.7.0_85 +1 javac 2m 29s the patch passed -1 checkstyle 0m 31s Patch generated 8 new checkstyle issues in hadoop-yarn-project/hadoop-yarn (total was 240, now 247). +1 mvnsite 1m 4s the patch passed +1 mvneclipse 0m 27s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 55s the patch passed +1 javadoc 1m 7s the patch passed with JDK v1.8.0_66 +1 javadoc 3m 19s the patch passed with JDK v1.7.0_85 +1 unit 0m 28s hadoop-yarn-api in the patch passed with JDK v1.8.0_66. +1 unit 9m 14s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66. +1 unit 0m 28s hadoop-yarn-api in the patch passed with JDK v1.7.0_85. +1 unit 9m 31s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_85. -1 asflicense 0m 21s Patch generated 3 ASF License warnings. 59m 27s Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12776258/YARN-1856.003.patch JIRA Issue YARN-1856 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 3e83da93fdd3 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / fc47084 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/9897/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt JDK v1.7.0_85 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/9897/testReport/ asflicense https://builds.apache.org/job/PreCommit-YARN-Build/9897/artifact/patchprocess/patch-asflicense-problems.txt modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn Max memory used 76MB Powered by Apache Yetus http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-YARN-Build/9897/console This message was automatically generated.
        Hide
        vvasudev Varun Vasudev added a comment -

        The license warnings aren't from the patch. They're coming from resourcemanager test files.

        Show
        vvasudev Varun Vasudev added a comment - The license warnings aren't from the patch. They're coming from resourcemanager test files.
        Hide
        sidharta-s Sidharta Seethana added a comment -

        Varun Vasudev , there is still an issue with the handling of the soft limit percentage. Isn't there a divide by 100 missing?

            long softLimit =
                (long) (container.getResource().getMemory() * softLimitPerc);
        

        The test code below needs to be updated too - instead of specifying the value of the soft limit percentage here in test code, maybe we should use DEFAULT_NM_MEMORY_RESOURCE_CGROUPS_SOFT_LIMIT_PERC ? It also looks like the validation of the memory value is not happening correctly below. You could use Mockito's eq() to verify argument values.

          verify(mockCGroupsHandler, times(1))
                .updateCGroupParam(CGroupsHandler.CGroupController.MEMORY, id,
                    CGroupsHandler.CGROUP_PARAM_MEMORY_SOFT_LIMIT_BYTES,
                    String.valueOf((int) (memory * 0.9)) + "M");
        
        Show
        sidharta-s Sidharta Seethana added a comment - Varun Vasudev , there is still an issue with the handling of the soft limit percentage. Isn't there a divide by 100 missing? long softLimit = ( long ) (container.getResource().getMemory() * softLimitPerc); The test code below needs to be updated too - instead of specifying the value of the soft limit percentage here in test code, maybe we should use DEFAULT_NM_MEMORY_RESOURCE_CGROUPS_SOFT_LIMIT_PERC ? It also looks like the validation of the memory value is not happening correctly below. You could use Mockito's eq() to verify argument values. verify(mockCGroupsHandler, times(1)) .updateCGroupParam(CGroupsHandler.CGroupController.MEMORY, id, CGroupsHandler.CGROUP_PARAM_MEMORY_SOFT_LIMIT_BYTES, String .valueOf(( int ) (memory * 0.9)) + "M" );
        Hide
        sidharta-s Sidharta Seethana added a comment -

        Ugh. IDE Snafu - someone how ended looking at an older version of the patch.

        +1 on the latest version of the patch.

        Show
        sidharta-s Sidharta Seethana added a comment - Ugh. IDE Snafu - someone how ended looking at an older version of the patch. +1 on the latest version of the patch.
        Hide
        vinodkv Vinod Kumar Vavilapalli added a comment -

        Quick comments on the patch:

        General

        • Should add all the configs to yarn-default.xml, saying they are still early configs?
        • Should update the documentation of pmem-check-enabled, vmem-check-enabled configs in code and yarn-default.xml to denote their relation to resource.memory.enabled.
        • Actually, given existing memory monitoring mechanism, NM_MEMORY_RESOURCE_ENABLED is in reality is already true when pmem/vmem checks are enabled. We need to reconcile the old and new configs some how. May be memory is always enabled, but if vmem/pmem configs are enabled, use old handler, otherwise use the new one? Thinking out aloud.
        • Does the soft and hard limits also some-how logically relate to pmem-vmem-ratio? If so, we should hint at that in the documentation.
        • Swappiness seems like a cluster configuration defaulting to zero. So far, this has been an implicit contract with our users, good to document this also in yarn-default.xml

        Code comments

        • ResourceHandlerModule
          • Formatting of new code is a little off: the declaration of getCgroupsMemoryResourceHandler(). There are other occurrences like this in that class before in this patch, you may want to fix those.
          • BUG! getCgroupsMemoryResourceHandler() incorrectly locks DiskResourceHandler instead of MemoryResourceHandler.
        • CGroupsMemoryResourceHandlerImpl
          • What is this doing? {{ CGroupsHandler.CGroupController MEMORY = CGroupsHandler.CGroupController.MEMORY; }} Is it forcing a class-load or something? Not sure if this is needed. If this is needed, you may want to add a comment here.
        • NM_MEMORY_RESOURCE_CGROUPS_SOFT_LIMIT_PERC -> NM_MEMORY_RESOURCE_CGROUPS_SOFT_LIMIT_PERCENTAGE. Similarly the default constant.
        • CGROUP_PARAM_MEMORY_HARD_LIMIT_BYTES / CGROUP_PARAM_MEMORY_SOFT_LIMIT_BYTES / CGROUP_PARAM_MEMORY_SWAPPINESS can all be static and final.
        Show
        vinodkv Vinod Kumar Vavilapalli added a comment - Quick comments on the patch: General Should add all the configs to yarn-default.xml, saying they are still early configs? Should update the documentation of pmem-check-enabled, vmem-check-enabled configs in code and yarn-default.xml to denote their relation to resource.memory.enabled. Actually, given existing memory monitoring mechanism, NM_MEMORY_RESOURCE_ENABLED is in reality is already true when pmem/vmem checks are enabled. We need to reconcile the old and new configs some how. May be memory is always enabled, but if vmem/pmem configs are enabled, use old handler, otherwise use the new one? Thinking out aloud. Does the soft and hard limits also some-how logically relate to pmem-vmem-ratio? If so, we should hint at that in the documentation. Swappiness seems like a cluster configuration defaulting to zero. So far, this has been an implicit contract with our users, good to document this also in yarn-default.xml Code comments ResourceHandlerModule Formatting of new code is a little off: the declaration of getCgroupsMemoryResourceHandler() . There are other occurrences like this in that class before in this patch, you may want to fix those. BUG! getCgroupsMemoryResourceHandler() incorrectly locks DiskResourceHandler instead of MemoryResourceHandler. CGroupsMemoryResourceHandlerImpl What is this doing? {{ CGroupsHandler.CGroupController MEMORY = CGroupsHandler.CGroupController.MEMORY; }} Is it forcing a class-load or something? Not sure if this is needed. If this is needed, you may want to add a comment here. NM_MEMORY_RESOURCE_CGROUPS_SOFT_LIMIT_PERC -> NM_MEMORY_RESOURCE_CGROUPS_SOFT_LIMIT_PERCENTAGE. Similarly the default constant. CGROUP_PARAM_MEMORY_HARD_LIMIT_BYTES / CGROUP_PARAM_MEMORY_SOFT_LIMIT_BYTES / CGROUP_PARAM_MEMORY_SWAPPINESS can all be static and final.
        Hide
        vvasudev Varun Vasudev added a comment -

        Should add all the configs to yarn-default.xml, saying they are still early configs?

        I don't think we've figured out how to specify the various resource isolation pieces from a config perspective. I'd like to keep to private for now and I'll file a follow up JIRA to document the configs once we've figured it out. The remaining points all relate to this so I'll address them when as part of that JIRA.

        ResourceHandlerModule - Formatting of new code is a little off: the declaration of getCgroupsMemoryResourceHandler(). There are other occurrences like this in that class before in this patch, you may want to fix those.

        Fixed.

        BUG! getCgroupsMemoryResourceHandler() incorrectly locks DiskResourceHandler instead of MemoryResourceHandler.
        CGroupsMemoryResourceHandlerImpl

        Not a bug! But a bad typo nonetheless. Fixed

        What is this doing? {{ CGroupsHandler.CGroupController MEMORY = CGroupsHandler.CGroupController.MEMORY; }} Is it forcing a class-load or something? Not sure if this is needed. If this is needed, you may want to add a comment here.

        No just a shorthand instead of specifying the entire qualified variable every time.

        NM_MEMORY_RESOURCE_CGROUPS_SOFT_LIMIT_PERC -> NM_MEMORY_RESOURCE_CGROUPS_SOFT_LIMIT_PERCENTAGE. Similarly the default constant.

        Fixed.

        CGROUP_PARAM_MEMORY_HARD_LIMIT_BYTES / CGROUP_PARAM_MEMORY_SOFT_LIMIT_BYTES / CGROUP_PARAM_MEMORY_SWAPPINESS can all be static and final.

        Interface variables are public static final by default. Any reason you want to add static final?

        Show
        vvasudev Varun Vasudev added a comment - Should add all the configs to yarn-default.xml, saying they are still early configs? I don't think we've figured out how to specify the various resource isolation pieces from a config perspective. I'd like to keep to private for now and I'll file a follow up JIRA to document the configs once we've figured it out. The remaining points all relate to this so I'll address them when as part of that JIRA. ResourceHandlerModule - Formatting of new code is a little off: the declaration of getCgroupsMemoryResourceHandler(). There are other occurrences like this in that class before in this patch, you may want to fix those. Fixed. BUG! getCgroupsMemoryResourceHandler() incorrectly locks DiskResourceHandler instead of MemoryResourceHandler. CGroupsMemoryResourceHandlerImpl Not a bug! But a bad typo nonetheless. Fixed What is this doing? {{ CGroupsHandler.CGroupController MEMORY = CGroupsHandler.CGroupController.MEMORY; }} Is it forcing a class-load or something? Not sure if this is needed. If this is needed, you may want to add a comment here. No just a shorthand instead of specifying the entire qualified variable every time. NM_MEMORY_RESOURCE_CGROUPS_SOFT_LIMIT_PERC -> NM_MEMORY_RESOURCE_CGROUPS_SOFT_LIMIT_PERCENTAGE. Similarly the default constant. Fixed. CGROUP_PARAM_MEMORY_HARD_LIMIT_BYTES / CGROUP_PARAM_MEMORY_SOFT_LIMIT_BYTES / CGROUP_PARAM_MEMORY_SWAPPINESS can all be static and final. Interface variables are public static final by default. Any reason you want to add static final?
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 0s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
        +1 mvninstall 9m 46s trunk passed
        +1 compile 3m 14s trunk passed with JDK v1.8.0_66
        +1 compile 2m 49s trunk passed with JDK v1.7.0_91
        +1 checkstyle 0m 38s trunk passed
        +1 mvnsite 1m 11s trunk passed
        +1 mvneclipse 0m 29s trunk passed
        +1 findbugs 2m 54s trunk passed
        +1 javadoc 1m 17s trunk passed with JDK v1.8.0_66
        +1 javadoc 3m 35s trunk passed with JDK v1.7.0_91
        +1 mvninstall 1m 4s the patch passed
        +1 compile 3m 11s the patch passed with JDK v1.8.0_66
        +1 javac 3m 11s the patch passed
        +1 compile 2m 47s the patch passed with JDK v1.7.0_91
        +1 javac 2m 47s the patch passed
        -1 checkstyle 0m 36s Patch generated 11 new checkstyle issues in hadoop-yarn-project/hadoop-yarn (total was 240, now 248).
        +1 mvnsite 1m 9s the patch passed
        +1 mvneclipse 0m 29s the patch passed
        +1 whitespace 0m 0s Patch has no whitespace issues.
        +1 findbugs 3m 13s the patch passed
        +1 javadoc 1m 21s the patch passed with JDK v1.8.0_66
        +1 javadoc 3m 35s the patch passed with JDK v1.7.0_91
        +1 unit 0m 34s hadoop-yarn-api in the patch passed with JDK v1.8.0_66.
        +1 unit 10m 26s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66.
        +1 unit 0m 33s hadoop-yarn-api in the patch passed with JDK v1.7.0_91.
        +1 unit 10m 7s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_91.
        +1 asflicense 0m 30s Patch does not generate ASF License warnings.
        67m 0s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:0ca8df7
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12776756/YARN-1856.004.patch
        JIRA Issue YARN-1856
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 6109c62a0d1b 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 132478e
        findbugs v3.0.0
        checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/9923/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt
        JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/9923/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn
        Max memory used 76MB
        Powered by Apache Yetus http://yetus.apache.org
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/9923/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 9m 46s trunk passed +1 compile 3m 14s trunk passed with JDK v1.8.0_66 +1 compile 2m 49s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 38s trunk passed +1 mvnsite 1m 11s trunk passed +1 mvneclipse 0m 29s trunk passed +1 findbugs 2m 54s trunk passed +1 javadoc 1m 17s trunk passed with JDK v1.8.0_66 +1 javadoc 3m 35s trunk passed with JDK v1.7.0_91 +1 mvninstall 1m 4s the patch passed +1 compile 3m 11s the patch passed with JDK v1.8.0_66 +1 javac 3m 11s the patch passed +1 compile 2m 47s the patch passed with JDK v1.7.0_91 +1 javac 2m 47s the patch passed -1 checkstyle 0m 36s Patch generated 11 new checkstyle issues in hadoop-yarn-project/hadoop-yarn (total was 240, now 248). +1 mvnsite 1m 9s the patch passed +1 mvneclipse 0m 29s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 3m 13s the patch passed +1 javadoc 1m 21s the patch passed with JDK v1.8.0_66 +1 javadoc 3m 35s the patch passed with JDK v1.7.0_91 +1 unit 0m 34s hadoop-yarn-api in the patch passed with JDK v1.8.0_66. +1 unit 10m 26s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66. +1 unit 0m 33s hadoop-yarn-api in the patch passed with JDK v1.7.0_91. +1 unit 10m 7s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_91. +1 asflicense 0m 30s Patch does not generate ASF License warnings. 67m 0s Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12776756/YARN-1856.004.patch JIRA Issue YARN-1856 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 6109c62a0d1b 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 132478e findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/9923/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/9923/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn Max memory used 76MB Powered by Apache Yetus http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-YARN-Build/9923/console This message was automatically generated.
        Hide
        vinodkv Vinod Kumar Vavilapalli added a comment -

        The style warnings are mostly non-fixable, what with long files and package names.

        +1 for splitting out the yarn-default.xml changes.

        The latest patch looks good to me, +1, checking this in.

        Show
        vinodkv Vinod Kumar Vavilapalli added a comment - The style warnings are mostly non-fixable, what with long files and package names. +1 for splitting out the yarn-default.xml changes. The latest patch looks good to me, +1, checking this in.
        Hide
        vinodkv Vinod Kumar Vavilapalli added a comment -

        Committed this to trunk and branch-2. Thanks Varun!

        Tx for all the reviews, Sidharta Seethana!.

        Show
        vinodkv Vinod Kumar Vavilapalli added a comment - Committed this to trunk and branch-2. Thanks Varun! Tx for all the reviews, Sidharta Seethana !.
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-trunk-Commit #8986 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8986/)
        YARN-1856. Added cgroups based memory monitoring for containers as (vinodkv: rev 4e7d32c0db69882cde854ef581056142a997c005)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsMemoryResourceHandlerImpl.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsHandler.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/MemoryResourceHandler.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TestCGroupsMemoryResourceHandlerImpl.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/ResourceHandlerModule.java
        • hadoop-yarn-project/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #8986 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8986/ ) YARN-1856 . Added cgroups based memory monitoring for containers as (vinodkv: rev 4e7d32c0db69882cde854ef581056142a997c005) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsMemoryResourceHandlerImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsHandler.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/MemoryResourceHandler.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TestCGroupsMemoryResourceHandlerImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/ResourceHandlerModule.java hadoop-yarn-project/CHANGES.txt
        Hide
        kasha Karthik Kambatla added a comment -

        Thanks Varun Vasudev for working on this, Sidharta Seethana and Vinod Kumar Vavilapalli for the reviews. Excited to see this land.

        Just checking - is there a JIRA for using memory.oom_control? If we don't disable oom_control, using the new cgroups-based monitoring/enforcing would be a lot more stricter compared to the proc-fs based checks and could lead to several task/job failures on existing clusters. OTOH, we might want to enable oom_control for opportunistic containers to be used in YARN-2877 and YARN-1011. If there is no JIRA yet and you guys are caught up, I am happy to file one and work on it.

        Show
        kasha Karthik Kambatla added a comment - Thanks Varun Vasudev for working on this, Sidharta Seethana and Vinod Kumar Vavilapalli for the reviews. Excited to see this land. Just checking - is there a JIRA for using memory.oom_control? If we don't disable oom_control, using the new cgroups-based monitoring/enforcing would be a lot more stricter compared to the proc-fs based checks and could lead to several task/job failures on existing clusters. OTOH, we might want to enable oom_control for opportunistic containers to be used in YARN-2877 and YARN-1011 . If there is no JIRA yet and you guys are caught up, I am happy to file one and work on it.
        Hide
        vvasudev Varun Vasudev added a comment -

        Karthik Kambatla - we haven't provided a flag for using oom_control, but we did provide a control to set swappiness(which currently is a config setting).

        Ideally oom_control, swappiness would be set by the AM/YARN client and should be container specific settings.

        In general, we need an API to set container executor specific settings - we've seen a need for this when adding Docker support and now for CGroups settings as well. If you'd like to work on it, is it possible to come up with an abstraction that'll solve the Docker issues as well? Sidharta Seethana and I can help provide context on the Docker use case.

        Show
        vvasudev Varun Vasudev added a comment - Karthik Kambatla - we haven't provided a flag for using oom_control, but we did provide a control to set swappiness(which currently is a config setting). Ideally oom_control, swappiness would be set by the AM/YARN client and should be container specific settings. In general, we need an API to set container executor specific settings - we've seen a need for this when adding Docker support and now for CGroups settings as well. If you'd like to work on it, is it possible to come up with an abstraction that'll solve the Docker issues as well? Sidharta Seethana and I can help provide context on the Docker use case.
        Hide
        kasha Karthik Kambatla added a comment -

        Ideally oom_control, swappiness would be set by the AM/YARN client and should be container specific settings.

        If we don't disable oom_control, wouldn't the current implementation kill containers as soon as they spike their usage over the configured hard limit which appears to be the container size? I feel this is too aggressive especially considering how a delayed GC could cause this so easily. No?

        I see your point about an application deciding whether its containers should be paused/killed. I think the default should be paused, i.e., disabled.

        In general, we need an API to set container executor specific settings - we've seen a need for this when adding Docker support and now for CGroups settings as well.

        Would like to understand this better. May be we should take this to another JIRA. I am open to discussing this offline before filing this JIRA and posting our thoughts there.

        Show
        kasha Karthik Kambatla added a comment - Ideally oom_control, swappiness would be set by the AM/YARN client and should be container specific settings. If we don't disable oom_control, wouldn't the current implementation kill containers as soon as they spike their usage over the configured hard limit which appears to be the container size? I feel this is too aggressive especially considering how a delayed GC could cause this so easily. No? I see your point about an application deciding whether its containers should be paused/killed. I think the default should be paused, i.e., disabled. In general, we need an API to set container executor specific settings - we've seen a need for this when adding Docker support and now for CGroups settings as well. Would like to understand this better. May be we should take this to another JIRA. I am open to discussing this offline before filing this JIRA and posting our thoughts there.
        Hide
        kasha Karthik Kambatla added a comment -

        Was catching up on YARN-3 (the JIRA that added cgroups) to see why we decided to not use it for enforcing memory. Bikas Saha has some valid points on not letting the kernel (through cgroups) kill processes that go over their allocated limits.

        To get the best of both worlds: I feel we should disable oom_control so the processes are paused but not killed. Thoughts?

        Show
        kasha Karthik Kambatla added a comment - Was catching up on YARN-3 (the JIRA that added cgroups) to see why we decided to not use it for enforcing memory. Bikas Saha has some valid points on not letting the kernel (through cgroups) kill processes that go over their allocated limits. To get the best of both worlds: I feel we should disable oom_control so the processes are paused but not killed. Thoughts?
        Hide
        vinodkv Vinod Kumar Vavilapalli added a comment -

        Varun Vasudev / Karthik Kambatla, seems like there are a couple of key proposals here, let's fork them off to separate tickets so they get the deserved attention.

        Show
        vinodkv Vinod Kumar Vavilapalli added a comment - Varun Vasudev / Karthik Kambatla , seems like there are a couple of key proposals here, let's fork them off to separate tickets so they get the deserved attention.
        Hide
        vinodkv Vinod Kumar Vavilapalli added a comment -

        Varun Vasudev / Karthik Kambatla, can you file tickets for some of your proposals above so that we don't drop them on the floor? Tx.

        Show
        vinodkv Vinod Kumar Vavilapalli added a comment - Varun Vasudev / Karthik Kambatla , can you file tickets for some of your proposals above so that we don't drop them on the floor? Tx.
        Hide
        kasha Karthik Kambatla added a comment -

        Vinod Kumar Vavilapalli - I have filed YARN-4599 for the same.

        Show
        kasha Karthik Kambatla added a comment - Vinod Kumar Vavilapalli - I have filed YARN-4599 for the same.

          People

          • Assignee:
            vvasudev Varun Vasudev
            Reporter:
            kasha Karthik Kambatla
          • Votes:
            0 Vote for this issue
            Watchers:
            32 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development