Hadoop YARN
  1. Hadoop YARN
  2. YARN-160

nodemanagers should obtain cpu/memory values from underlying OS

    Details

    • Type: Improvement Improvement
    • Status: Patch Available
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 2.0.3-alpha
    • Fix Version/s: 2.7.0
    • Component/s: nodemanager
    • Labels:
      None

      Description

      As mentioned in YARN-2

      NM memory and CPU configs

      Currently these values are coming from the config of the NM, we should be able to obtain those values from the OS (ie, in the case of Linux from /proc/meminfo & /proc/cpuinfo). As this is highly OS dependent we should have an interface that obtains this information. In addition implementations of this interface should be able to specify a mem/cpu offset (amount of mem/cpu not to be avail as YARN resource), this would allow to reserve mem/cpu for the OS and other services outside of YARN containers.

      1. apache-yarn-160.3.patch
        39 kB
        Varun Vasudev
      2. apache-yarn-160.2.patch
        38 kB
        Varun Vasudev
      3. apache-yarn-160.1.patch
        24 kB
        Varun Vasudev
      4. apache-yarn-160.0.patch
        24 kB
        Varun Vasudev

        Issue Links

          Activity

          Hide
          Varun Vasudev added a comment -

          HADOOP_HEAPSIZE_MAX in trunk. HADOOP_HEAPSIZE was deprecated.

          Thanks for pointing this out Allen. I'll provide a patch for trunk and one for branch-2 when I address Vinod's comments.

          yarn.nodemanager.count-logical-processors-as-cores: Not sure of the use for this. On Linux, shouldn't we simply use the the returned numCores if they are valid? And fall-back to numProcessors?

          Some people prefer to count hyperthreads as a CPU and some don't. This lets users choose.

          yarn.nodemanager.enable-hardware-capability-detection: I think specifying the capabilities to be -1 is already a way to trigger this automatic detection, let's simply drop the flag and assume it to be true all the time?

          Junping felt we should add it to cover upgrade scenarios. What do you think?

          We already have resource.percentage-physical-cpu-limit for CPUs - YARN-2440. How about simply adding a resource.percentage-pmem-limit instead making it a magic number in the code? Of course, we can have a default reserved percentage.

          I think resource.percentage-pmem-limit should be analogous to resource.percentage-physical-cpu-limit in that it sets the limit as a percentage of total memory. What about something like "yarn.nodemanager.default-percentage-pmem-limit"?

          Show
          Varun Vasudev added a comment - HADOOP_HEAPSIZE_MAX in trunk. HADOOP_HEAPSIZE was deprecated. Thanks for pointing this out Allen. I'll provide a patch for trunk and one for branch-2 when I address Vinod's comments. yarn.nodemanager.count-logical-processors-as-cores: Not sure of the use for this. On Linux, shouldn't we simply use the the returned numCores if they are valid? And fall-back to numProcessors? Some people prefer to count hyperthreads as a CPU and some don't. This lets users choose. yarn.nodemanager.enable-hardware-capability-detection: I think specifying the capabilities to be -1 is already a way to trigger this automatic detection, let's simply drop the flag and assume it to be true all the time? Junping felt we should add it to cover upgrade scenarios. What do you think? We already have resource.percentage-physical-cpu-limit for CPUs - YARN-2440 . How about simply adding a resource.percentage-pmem-limit instead making it a magic number in the code? Of course, we can have a default reserved percentage. I think resource.percentage-pmem-limit should be analogous to resource.percentage-physical-cpu-limit in that it sets the limit as a percentage of total memory. What about something like "yarn.nodemanager.default-percentage-pmem-limit"?
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Quick comments on the patch:

          • LinuxResourceCalculatorPlugin: numPhysicalSockets is not used anywhere?
          • WindowsResourceCalculatorPlugin: Why is num-cores set = num-processors ?
          • yarn-default.xml: Change "it will set the X to Y" to be "it will set X to Y by default"
          • yarn.nodemanager.count-logical-processors-as-cores: Not sure of the use for this. On Linux, shouldn't we simply use the the returned numCores if they are valid? And fall-back to numProcessors?
          • yarn.nodemanager.enable-hardware-capability-detection: I think specifying the capabilities to be -1 is already a way to trigger this automatic detection, let's simply drop the flag and assume it to be true all the time?
          • CGroupsLCEResourceHandler: The log message 'LOG.info("node vcores = " + nodeVCores);' is printed for every container launch.
          • Should we enforce somewhere that numCores >= numProcessors if not that it is always a multiple?
                 int containerPhysicalMemoryMB =
                      (int) (0.8f * (physicalMemoryMB - (2 * hadoopHeapSizeMB)));
          

          We already have resource.percentage-physical-cpu-limit for CPUs - YARN-2440. How about simply adding a resource.percentage-pmem-limit instead making it a magic number in the code? Of course, we can have a default reserved percentage.

          Show
          Vinod Kumar Vavilapalli added a comment - Quick comments on the patch: LinuxResourceCalculatorPlugin: numPhysicalSockets is not used anywhere? WindowsResourceCalculatorPlugin: Why is num-cores set = num-processors ? yarn-default.xml: Change "it will set the X to Y" to be "it will set X to Y by default" yarn.nodemanager.count-logical-processors-as-cores: Not sure of the use for this. On Linux, shouldn't we simply use the the returned numCores if they are valid? And fall-back to numProcessors? yarn.nodemanager.enable-hardware-capability-detection: I think specifying the capabilities to be -1 is already a way to trigger this automatic detection, let's simply drop the flag and assume it to be true all the time? CGroupsLCEResourceHandler: The log message 'LOG.info("node vcores = " + nodeVCores);' is printed for every container launch. Should we enforce somewhere that numCores >= numProcessors if not that it is always a multiple? int containerPhysicalMemoryMB = ( int ) (0.8f * (physicalMemoryMB - (2 * hadoopHeapSizeMB))); We already have resource.percentage-physical-cpu-limit for CPUs - YARN-2440 . How about simply adding a resource.percentage-pmem-limit instead making it a magic number in the code? Of course, we can have a default reserved percentage.
          Hide
          Allen Wittenauer added a comment -

          RAM-2*HADOOP_HEAPSIZE

          HADOOP_HEAPSIZE_MAX in trunk. HADOOP_HEAPSIZE was deprecated.

          Show
          Allen Wittenauer added a comment - RAM-2*HADOOP_HEAPSIZE HADOOP_HEAPSIZE_MAX in trunk. HADOOP_HEAPSIZE was deprecated.
          Hide
          Varun Vasudev added a comment -

          The findbugs warnings are unrelated to the patch.

          Show
          Varun Vasudev added a comment - The findbugs warnings are unrelated to the patch.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12690562/apache-yarn-160.3.patch
          against trunk revision 788ee35.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 29 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-tools/hadoop-gridmix hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6269//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6269//artifact/patchprocess/newPatchFindbugsWarningshadoop-gridmix.html
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6269//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12690562/apache-yarn-160.3.patch against trunk revision 788ee35. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 29 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-tools/hadoop-gridmix hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6269//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6269//artifact/patchprocess/newPatchFindbugsWarningshadoop-gridmix.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6269//console This message is automatically generated.
          Hide
          Varun Vasudev added a comment -

          Uploaded a new patch - apache-yarn-160.3.patch.
          1. rebase to trunk
          2. add a flag that allows users to turn off detection of underlying hardware.

          Show
          Varun Vasudev added a comment - Uploaded a new patch - apache-yarn-160.3.patch. 1. rebase to trunk 2. add a flag that allows users to turn off detection of underlying hardware.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12664137/apache-yarn-160.2.patch
          against trunk revision 1556f86.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5953//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12664137/apache-yarn-160.2.patch against trunk revision 1556f86. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5953//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12664137/apache-yarn-160.2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-tools/hadoop-gridmix hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4721//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4721//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12664137/apache-yarn-160.2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-tools/hadoop-gridmix hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4721//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4721//console This message is automatically generated.
          Hide
          Varun Vasudev added a comment -

          Comments from Jason Lowe in YARN-2440 about this feature led to some more changes. The latest patch introduces some new config variables
          1. yarn.nodemanager.containers-cpu-cores - the number of cores to be used for yarn containers. By default we use all cores.
          2. yarn.nodemanager.containers-cpu-percentage - the percentage of overall cpu to be used for yarn containers. By default we use all CPU.
          3. yarn.nodemanager.pcores-vcores-multiplier - a multiplier to convert pcores to vcores. By default it is 1. This can be used on clusters with heterogeneous hardware to have more containers run on faster CPUs.
          4. yarn.nodemanager.count-logical-processors-as-cores - flag to determine if hperthreads should be counted as cores. By default it is true.

          There's a some code between YARN-2440 and this patch. Depending on which one gets committed first, I'll change the patch appropriately.

          Show
          Varun Vasudev added a comment - Comments from Jason Lowe in YARN-2440 about this feature led to some more changes. The latest patch introduces some new config variables 1. yarn.nodemanager.containers-cpu-cores - the number of cores to be used for yarn containers. By default we use all cores. 2. yarn.nodemanager.containers-cpu-percentage - the percentage of overall cpu to be used for yarn containers. By default we use all CPU. 3. yarn.nodemanager.pcores-vcores-multiplier - a multiplier to convert pcores to vcores. By default it is 1. This can be used on clusters with heterogeneous hardware to have more containers run on faster CPUs. 4. yarn.nodemanager.count-logical-processors-as-cores - flag to determine if hperthreads should be counted as cores. By default it is true. There's a some code between YARN-2440 and this patch. Depending on which one gets committed first, I'll change the patch appropriately.
          Hide
          Varun Vasudev added a comment -

          Junping Du

          Both physical id and core id are not guaranteed to have in /proc/cpuinfo (please see below for my local VM's info). We may use processor number instead in case these ids are 0 (like we did in Windows). Again, this weak my confidence that this automatic way of getting CPU/memory resources should happen by default (not sure if any cross-platform issues). May be a safer way here is to keep previous default behavior (with some static setting) with an extra config to enable this. We can wait this feature to be more stable later to change the default behavior.

          processor	: 0
          vendor_id	: GenuineIntel
          cpu family	: 6
          model		: 70
          model name	: Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
          stepping	: 1
          cpu MHz		: 2295.265
          cache size	: 6144 KB
          fpu		: yes
          fpu_exception	: yes
          cpuid level	: 13
          wp		: yes
          flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc up arch_perfmon pebs bts xtopology tsc_reliable nonstop_tsc aperfmperf unfair_spinlock pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi ept vpid fsgsbase smep
          bogomips	: 4590.53
          clflush size	: 64
          cache_alignment	: 64
          address sizes	: 40 bits physical, 48 bits virtual
          power management:
          

          In the example you gave, where we have processors listed but no physical id or core id entries, the numProcessors will be set to the number of entries and numCores will be set to 1. From the diff -

          +      numCores = 1;
          

          There is also a test case to ensure this behaviour.

          In addition, cluster administrators can decide whether the NodeManager should report numProcessors or numCores by toggling yarn.nodemanager.resource.count-logical-processors-as-vcores which by default is true. In the vm example, by default the NodeManager will report vcores as the number of processor entries in /proc/cpuinfo. If yarn.nodemanager.resource.count-logical-processors-as-vcores is set to false, the NodeManager will report vcores as 1(if there are no physical id or core id entries).

          Show
          Varun Vasudev added a comment - Junping Du Both physical id and core id are not guaranteed to have in /proc/cpuinfo (please see below for my local VM's info). We may use processor number instead in case these ids are 0 (like we did in Windows). Again, this weak my confidence that this automatic way of getting CPU/memory resources should happen by default (not sure if any cross-platform issues). May be a safer way here is to keep previous default behavior (with some static setting) with an extra config to enable this. We can wait this feature to be more stable later to change the default behavior. processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 70 model name : Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz stepping : 1 cpu MHz : 2295.265 cache size : 6144 KB fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc up arch_perfmon pebs bts xtopology tsc_reliable nonstop_tsc aperfmperf unfair_spinlock pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi ept vpid fsgsbase smep bogomips : 4590.53 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: In the example you gave, where we have processors listed but no physical id or core id entries, the numProcessors will be set to the number of entries and numCores will be set to 1. From the diff - + numCores = 1; There is also a test case to ensure this behaviour. In addition, cluster administrators can decide whether the NodeManager should report numProcessors or numCores by toggling yarn.nodemanager.resource.count-logical-processors-as-vcores which by default is true. In the vm example, by default the NodeManager will report vcores as the number of processor entries in /proc/cpuinfo. If yarn.nodemanager.resource.count-logical-processors-as-vcores is set to false, the NodeManager will report vcores as 1(if there are no physical id or core id entries).
          Hide
          Junping Du added a comment -

          Thanks for addressing my comments, Varun Vasudev!

          +  private static final Pattern PHYSICAL_ID_FORMAT =
          +      Pattern.compile("^physical id[ \t]*:[ \t]*([0-9]*)");
          +  private static final Pattern CORE_ID_FORMAT =
          +      Pattern.compile("^core id[ \t]*:[ \t]*([0-9]*)");
          

          Both physical id and core id are not guaranteed to have in /proc/cpuinfo (please see below for my local VM's info). We may use processor number instead in case these ids are 0 (like we did in Windows). Again, this weak my confidence that this automatic way of getting CPU/memory resources should happen by default (not sure if any cross-platform issues). May be a safer way here is to keep previous default behavior (with some static setting) with an extra config to enable this. We can wait this feature to be more stable later to change the default behavior.

          processor	: 0
          vendor_id	: GenuineIntel
          cpu family	: 6
          model		: 70
          model name	: Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
          stepping	: 1
          cpu MHz		: 2295.265
          cache size	: 6144 KB
          fpu		: yes
          fpu_exception	: yes
          cpuid level	: 13
          wp		: yes
          flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc up arch_perfmon pebs bts xtopology tsc_reliable nonstop_tsc aperfmperf unfair_spinlock pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi ept vpid fsgsbase smep
          bogomips	: 4590.53
          clflush size	: 64
          cache_alignment	: 64
          address sizes	: 40 bits physical, 48 bits virtual
          power management:
          
          Show
          Junping Du added a comment - Thanks for addressing my comments, Varun Vasudev ! + private static final Pattern PHYSICAL_ID_FORMAT = + Pattern.compile( "^physical id[ \t]*:[ \t]*([0-9]*)" ); + private static final Pattern CORE_ID_FORMAT = + Pattern.compile( "^core id[ \t]*:[ \t]*([0-9]*)" ); Both physical id and core id are not guaranteed to have in /proc/cpuinfo (please see below for my local VM's info). We may use processor number instead in case these ids are 0 (like we did in Windows). Again, this weak my confidence that this automatic way of getting CPU/memory resources should happen by default (not sure if any cross-platform issues). May be a safer way here is to keep previous default behavior (with some static setting) with an extra config to enable this. We can wait this feature to be more stable later to change the default behavior. processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 70 model name : Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz stepping : 1 cpu MHz : 2295.265 cache size : 6144 KB fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc up arch_perfmon pebs bts xtopology tsc_reliable nonstop_tsc aperfmperf unfair_spinlock pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi ept vpid fsgsbase smep bogomips : 4590.53 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management:
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12663362/apache-yarn-160.1.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          -1 release audit. The applied patch generated 3 release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-tools/hadoop-gridmix hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4683//testReport/
          Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/4683//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4683//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12663362/apache-yarn-160.1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. -1 release audit . The applied patch generated 3 release audit warnings. +1 core tests . The patch passed unit tests in hadoop-tools/hadoop-gridmix hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4683//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/4683//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4683//console This message is automatically generated.
          Hide
          Varun Vasudev added a comment -

          Uploaded new patch to address Junping Du comments.

          Show
          Varun Vasudev added a comment - Uploaded new patch to address Junping Du comments.
          Hide
          Junping Du added a comment -

          The patch supports the old way.

          Thanks for clarification here. Yes. I saw the details of getYARNContainerMemoryMB() which sounds to honor previous NM resource configuration.

          Isn't calculating the values from the hardware a better option?

          Agree. But if the calculating results is not reasonable (like 0 or minus value), shall we use previous NM default value instead? At least, experienced users (especially with test purpose) already had some expectations even when they don't set any resource value here.

          Show
          Junping Du added a comment - The patch supports the old way. Thanks for clarification here. Yes. I saw the details of getYARNContainerMemoryMB() which sounds to honor previous NM resource configuration. Isn't calculating the values from the hardware a better option? Agree. But if the calculating results is not reasonable (like 0 or minus value), shall we use previous NM default value instead? At least, experienced users (especially with test purpose) already had some expectations even when they don't set any resource value here.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12662475/apache-yarn-160.0.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-tools/hadoop-gridmix hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4664//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4664//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12662475/apache-yarn-160.0.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-tools/hadoop-gridmix hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4664//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4664//console This message is automatically generated.
          Hide
          Varun Vasudev added a comment -

          Junping Du

          The old way to configure resource of NM is still useful, especially when there are other agents running (like: HBase RegionServer). Thus, user need flexibility to calculate resource themselves in some cases, so we should provide another new option instead of removing old way completely.

          The patch supports the old way. If a user has set values for memory and vcores, they're used without looking at the underlying hardware. I've added test cases to verify that behaviour as well. Have I missed a use case?

          Given this is a new feature, we shouldn't change cluster's behavior with old configuration in upgrade prospective. We should keep previous configuration work as usual especially when user use some default settings.

          There are two scenarios here -
          1. A configuration file with custom settings for memory and cpu - nothing will change for these users.
          2. A configuration file with no settings for memory and cpu - in this case, the memory and cpu resources will be calculated based on the underlying hardware instead of them being set to 8192 and 8 respectively. Isn't calculating the values from the hardware a better option? If people feel strongly about sticking to 8192 and 8, I don't have any problems changing them but it seems a bit odd.

          Show
          Varun Vasudev added a comment - Junping Du The old way to configure resource of NM is still useful, especially when there are other agents running (like: HBase RegionServer). Thus, user need flexibility to calculate resource themselves in some cases, so we should provide another new option instead of removing old way completely. The patch supports the old way. If a user has set values for memory and vcores, they're used without looking at the underlying hardware. I've added test cases to verify that behaviour as well. Have I missed a use case? Given this is a new feature, we shouldn't change cluster's behavior with old configuration in upgrade prospective. We should keep previous configuration work as usual especially when user use some default settings. There are two scenarios here - 1. A configuration file with custom settings for memory and cpu - nothing will change for these users. 2. A configuration file with no settings for memory and cpu - in this case, the memory and cpu resources will be calculated based on the underlying hardware instead of them being set to 8192 and 8 respectively. Isn't calculating the values from the hardware a better option? If people feel strongly about sticking to 8192 and 8, I don't have any problems changing them but it seems a bit odd.
          Hide
          Junping Du added a comment -

          Thanks Varun Vasudev for working on this. Just take a quick glance, a few comments:

          • The old way to configure resource of NM is still useful, especially when there are other agents running (like: HBase RegionServer). Thus, user need flexibility to calculate resource themselves in some cases, so we should provide another new option instead of removing old way completely.
          • Given this is a new feature, we shouldn't change cluster's behavior with old configuration in upgrade prospective. We should keep previous configuration work as usual especially when user use some default settings.
          Show
          Junping Du added a comment - Thanks Varun Vasudev for working on this. Just take a quick glance, a few comments: The old way to configure resource of NM is still useful, especially when there are other agents running (like: HBase RegionServer). Thus, user need flexibility to calculate resource themselves in some cases, so we should provide another new option instead of removing old way completely. Given this is a new feature, we shouldn't change cluster's behavior with old configuration in upgrade prospective. We should keep previous configuration work as usual especially when user use some default settings.
          Hide
          Varun Vasudev added a comment -

          Patch to automatically calculate cpu and memory from the OS. In case of cpu, I've added a flag to allow admins to decide if they want to count hyperthreads as vcores or not(by default they are counted as vcores). The flag currently works on Linux only.

          In case of memory, I calculate the memory for containers as 80% of (RAM - 2*HADOOP_HEAPSIZE), to account for memory used by the datanode and the nodemanager.

          I've also changed the default behaviour to use the calculated values instead of 8 and 8192 for cpu and memory that we have been using till now.

          Feedback would be welcome.

          Show
          Varun Vasudev added a comment - Patch to automatically calculate cpu and memory from the OS. In case of cpu, I've added a flag to allow admins to decide if they want to count hyperthreads as vcores or not(by default they are counted as vcores). The flag currently works on Linux only. In case of memory, I calculate the memory for containers as 80% of (RAM - 2*HADOOP_HEAPSIZE), to account for memory used by the datanode and the nodemanager. I've also changed the default behaviour to use the calculated values instead of 8 and 8192 for cpu and memory that we have been using till now. Feedback would be welcome.
          Hide
          Alejandro Abdelnur added a comment -

          I have my hands full at the moment, I won't be able to take onto this one for a while.

          Making it unassigned in case somebody wants to take a stab to it.

          Show
          Alejandro Abdelnur added a comment - I have my hands full at the moment, I won't be able to take onto this one for a while. Making it unassigned in case somebody wants to take a stab to it.
          Hide
          Timothy St. Clair added a comment -

          I think the prudent approach would be to evaluate hwloc and its community, and determine if it meets the internal needs of YARN. For risk mitigation purposes, I think having a plugin abstraction layer as a fallback would also be wise.

          I did notice there are also java bindings around hwloc (https://launchpad.net/jhwloc/)

          Show
          Timothy St. Clair added a comment - I think the prudent approach would be to evaluate hwloc and its community, and determine if it meets the internal needs of YARN. For risk mitigation purposes, I think having a plugin abstraction layer as a fallback would also be wise. I did notice there are also java bindings around hwloc ( https://launchpad.net/jhwloc/ )
          Hide
          Arun C Murthy added a comment -

          Alejandro Abdelnur Is there a chance you plan on working on this? I'd like to get this into 2.3.0 if possible. Thanks!

          Show
          Arun C Murthy added a comment - Alejandro Abdelnur Is there a chance you plan on working on this? I'd like to get this into 2.3.0 if possible. Thanks!
          Hide
          Arun C Murthy added a comment -

          I'm particularly interested in getting this done in light of YARN-1024 - we should look for something we can normalize down to YVC (or ECU).

          Show
          Arun C Murthy added a comment - I'm particularly interested in getting this done in light of YARN-1024 - we should look for something we can normalize down to YVC (or ECU).
          Hide
          Arun C Murthy added a comment -

          I'd like to push to get this done asap.

          [~t.st.clair] thanks for pointer to hwloc. In your opinion, should we directly use hwloc in YARN rather than invent our own? What has been your experience with hwloc? It does have an appropriate license (BSD - http://www.open-mpi.org/projects/hwloc/license.php) ...

          Show
          Arun C Murthy added a comment - I'd like to push to get this done asap. [~t.st.clair] thanks for pointer to hwloc. In your opinion, should we directly use hwloc in YARN rather than invent our own? What has been your experience with hwloc? It does have an appropriate license (BSD - http://www.open-mpi.org/projects/hwloc/license.php ) ...
          Hide
          Timothy St. Clair added a comment -

          If it's possible to tag along development on this one, I would be interested in the approach. IMHO referencing existing solutions gauges baseline:

          Ref:
          http://www.open-mpi.org/projects/hwloc/
          http://www.rce-cast.com/Podcast/rce-33-hwloc-portable-hardware-locality.html
          http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html

          Show
          Timothy St. Clair added a comment - If it's possible to tag along development on this one, I would be interested in the approach. IMHO referencing existing solutions gauges baseline: Ref: http://www.open-mpi.org/projects/hwloc/ http://www.rce-cast.com/Podcast/rce-33-hwloc-portable-hardware-locality.html http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html
          Hide
          Radim Kolar added a comment -

          My proposal is to make entire new framework for resource management - MAPREDUCE-4256

          Show
          Radim Kolar added a comment - My proposal is to make entire new framework for resource management - MAPREDUCE-4256
          Hide
          Alejandro Abdelnur added a comment -

          Radim, thanks for pointing that. This it seems it is not wired for the NMs to report such info back to the RM, looking at the code NodeManager creates a NodeStatusUpdater using values from configuration files. Only the ContainersMonitorIMpl reads the underlying memory. So this JIRA would be to get the NodeStatusUpdater to use a resourcecalculator. Makes sense?

          Show
          Alejandro Abdelnur added a comment - Radim, thanks for pointing that. This it seems it is not wired for the NMs to report such info back to the RM, looking at the code NodeManager creates a NodeStatusUpdater using values from configuration files. Only the ContainersMonitorIMpl reads the underlying memory. So this JIRA would be to get the NodeStatusUpdater to use a resourcecalculator. Makes sense?
          Hide
          Radim Kolar added a comment -

          there is a interface from obtaining these values from OS. Its called resourcecalculator plugin. It might be good to do autodetection but i need ability to override these values in config.

          Show
          Radim Kolar added a comment - there is a interface from obtaining these values from OS. Its called resourcecalculator plugin. It might be good to do autodetection but i need ability to override these values in config.

            People

            • Assignee:
              Varun Vasudev
              Reporter:
              Alejandro Abdelnur
            • Votes:
              0 Vote for this issue
              Watchers:
              21 Start watching this issue

              Dates

              • Created:
                Updated:

                Development