Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-810

Support CGroup ceiling enforcement on CPU

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.1.0-beta, 2.0.5-alpha
    • Fix Version/s: None
    • Component/s: nodemanager
    • Labels:

      Description

      Problem statement:

      YARN currently lets you define an NM's pcore count, and a pcore:vcore ratio. Containers are then allowed to request vcores between the minimum and maximum defined in the yarn-site.xml.

      In the case where a single-threaded container requests 1 vcore, with a pcore:vcore ratio of 1:4, the container is still allowed to use up to 100% of the core it's using, provided that no other container is also using it. This happens, even though the only guarantee that YARN/CGroups is making is that the container will get "at least" 1/4th of the core.

      If a second container then comes along, the second container can take resources from the first, provided that the first container is still getting at least its fair share (1/4th).

      There are certain cases where this is desirable. There are also certain cases where it might be desirable to have a hard limit on CPU usage, and not allow the process to go above the specified resource requirement, even if it's available.

      Here's an RFC that describes the problem in more detail:

      http://lwn.net/Articles/336127/

      Solution:

      As it happens, when CFS is used in combination with CGroups, you can enforce a ceiling using two files in cgroups:

      cpu.cfs_quota_us
      cpu.cfs_period_us
      

      The usage of these two files is documented in more detail here:

      https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpu.html

      Testing:

      I have tested YARN CGroups using the 2.0.5-alpha implementation. By default, it behaves as described above (it is a soft cap, and allows containers to use more than they asked for). I then tested CFS CPU quotas manually with YARN.

      First, you can see that CFS is in use in the CGroup, based on the file names:

          [criccomi@eat1-qa464 ~]$ sudo -u app ls -l /cgroup/cpu/hadoop-yarn/
          total 0
          -r--r--r-- 1 app app 0 Jun 13 16:46 cgroup.procs
          drwxr-xr-x 2 app app 0 Jun 13 17:08 container_1371141151815_0004_01_000002
          -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_period_us
          -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_quota_us
          -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_period_us
          -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_runtime_us
          -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.shares
          -r--r--r-- 1 app app 0 Jun 13 16:46 cpu.stat
          -rw-r--r-- 1 app app 0 Jun 13 16:46 notify_on_release
          -rw-r--r-- 1 app app 0 Jun 13 16:46 tasks
          [criccomi@eat1-qa464 ~]$ sudo -u app cat
          /cgroup/cpu/hadoop-yarn/cpu.cfs_period_us
          100000
          [criccomi@eat1-qa464 ~]$ sudo -u app cat
          /cgroup/cpu/hadoop-yarn/cpu.cfs_quota_us
          -1
      

      Oddly, it appears that the cfs_period_us is set to .1s, not 1s.

      We can place processes in hard limits. I have process 4370 running YARN container container_1371141151815_0003_01_000003 on a host. By default, it's running at ~300% cpu usage.

                                                  CPU
          4370 criccomi  20   0 1157m 551m  14m S 240.3  0.8  87:10.91 ...
      

      When I set the CFS quote:

          echo 1000 > /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_000003/cpu.cfs_quota_us
                                                   CPU
          4370 criccomi  20   0 1157m 563m  14m S  1.0  0.8  90:08.39 ...
      

      It drops to 1% usage, and you can see the box has room to spare:

          Cpu(s):  2.4%us,  1.0%sy,  0.0%ni, 92.2%id,  4.2%wa,  0.0%hi,  0.1%si, 0.0%st
      

      Turning the quota back to -1:

          echo -1 > /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_000003/cpu.cfs_quota_us
      

      Burns the cores again:

          Cpu(s): 11.1%us,  1.7%sy,  0.0%ni, 83.9%id,  3.1%wa,  0.0%hi,  0.2%si, 0.0%st
                                                  CPU
          4370 criccomi  20   0 1157m 563m  14m S 253.9  0.8  89:32.31 ...
      

      On my dev box, I was testing CGroups by running a python process eight times, to burn through all the cores, since it was doing as described above (giving extra CPU to the process, even with a cpu.shares limit). Toggling the cfs_quota_us seems to enforce a hard limit.

      Implementation:

      What do you guys think about introducing a variable to YarnConfiguration:

      yarn.nodemanager.linux-container.executor.cgroups.cpu-ceiling-enforcement

      The default would be false. Setting to true, would cause YARN's LCE to set:

      cpu.cfs_quota_us=(container-request-vcores/nm-vcore-to-pcore-ratio) * 1000000
      cpu.cfs_period_us=1000000
      

      For example, if a container asks for 2 vcores, and the vcore:pcore ratio is 4, you'd get:

      cpu.cfs_quota_us=(2/4) * 1000000 = 500000
      cpu.cfs_period_us=1000000
      

      This would cause CFS to cap the process at 50% of clock cycles.

      What do you guys think?

      1. Does this seem like a reasonable request? We have some use-cases for it.
      2. It's unclear to me how cpu.shares interacts with cpu.cfs_*. I think the ceiling is hard, no matter what shares is set to. I assume shares only comes into play if the CFS quota has not been reached, and the process begins competing with others for CPU resources.
      3. Should this be an LCE config (yarn.nodemanager.linux-container-executor), or should it be a generic scheduler config (yarn.scheduler.enforce-ceiling-vcores).

      1. YARN-810.patch
        53 kB
        Wei Yan
      2. YARN-810.patch
        52 kB
        Wei Yan
      3. YARN-810-3.patch
        60 kB
        Wei Yan
      4. YARN-810-4.patch
        60 kB
        Wei Yan
      5. YARN-810-5.patch
        39 kB
        Wei Yan
      6. YARN-810-6.patch
        42 kB
        Wei Yan

        Issue Links

          Activity

          Hide
          sandyr Sandy Ryza added a comment -

          Chris Riccomini, I'm intending to remove the vcore-pcore ratio in YARN-782. If we did this and set a % ceiling on the amount of CPU that the sum of all containers can occupy, would that also satisfy your use case?

          Show
          sandyr Sandy Ryza added a comment - Chris Riccomini , I'm intending to remove the vcore-pcore ratio in YARN-782 . If we did this and set a % ceiling on the amount of CPU that the sum of all containers can occupy, would that also satisfy your use case?
          Hide
          sandyr Sandy Ryza added a comment -

          a configurable % ceiling I mean.

          Show
          sandyr Sandy Ryza added a comment - a configurable % ceiling I mean.
          Hide
          criccomini Chris Riccomini added a comment -

          Hey Sandy,

          If I understand you correctly, not quite. I think what you're saying is, if we set a % ceiling that all containers combined could use (say 80%), then a single container running would get 80% usage, but if two containers were running, they'd get roughly 40% each, right?

          What I'm saying is, if one container is running, it gets a maximum 40% of a core (even if the other 60% is available). If two are running, they still both get 40% of a core.

          We have a situation where we want very predictable CPU usage. We don't want a container to run happily because it's been over-provisioned based on luck, and then when a second container gets allocated on the box, it suddenly slows down to its allocated CPU usage, and slows way down. We'd rather it be very predictable, and know up front that the allocated CPU resources aren't enough.

          Does this make sense? I'm not sure I'm making things as clear as they could be.

          Cheers,
          Chris

          Show
          criccomini Chris Riccomini added a comment - Hey Sandy, If I understand you correctly, not quite. I think what you're saying is, if we set a % ceiling that all containers combined could use (say 80%), then a single container running would get 80% usage, but if two containers were running, they'd get roughly 40% each, right? What I'm saying is, if one container is running, it gets a maximum 40% of a core (even if the other 60% is available). If two are running, they still both get 40% of a core. We have a situation where we want very predictable CPU usage. We don't want a container to run happily because it's been over-provisioned based on luck, and then when a second container gets allocated on the box, it suddenly slows down to its allocated CPU usage, and slows way down. We'd rather it be very predictable, and know up front that the allocated CPU resources aren't enough. Does this make sense? I'm not sure I'm making things as clear as they could be. Cheers, Chris
          Hide
          criccomini Chris Riccomini added a comment -

          For the record, I also tested that setting values where cfs_quota_us > cfs_period_us works, and behaves as expected. The behavior appears to be:

          1. cfs_period_us is the period for a single cpu
          2. granting a cfs_quota_us > period allows you to use more than one core, as expected.

          That is, setting this:

          [app@eat1-qa466 criccomi]$ cat /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_000003/cpu.cfs_period_us
          100000
          [app@eat1-qa466 criccomi]$ cat /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_000003/cpu.cfs_quota_us
          200000
          

          Lets the container use 200% of CPU (in top). Likewise, setting to 150000 gives 150% in top.

          Show
          criccomini Chris Riccomini added a comment - For the record, I also tested that setting values where cfs_quota_us > cfs_period_us works, and behaves as expected. The behavior appears to be: 1. cfs_period_us is the period for a single cpu 2. granting a cfs_quota_us > period allows you to use more than one core, as expected. That is, setting this: [app@eat1-qa466 criccomi]$ cat /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_000003/cpu.cfs_period_us 100000 [app@eat1-qa466 criccomi]$ cat /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_000003/cpu.cfs_quota_us 200000 Lets the container use 200% of CPU (in top). Likewise, setting to 150000 gives 150% in top.
          Hide
          criccomini Chris Riccomini added a comment -

          Hey Sandy,

          Thinking about this more, relative to YARN-782. What about:

          1. yarn-site.xml contains NM's physical cores
          2. yarn-site.xml contains NM's physical core:virtual core ratio
          3. container resource request is % of a single virtual core that's needed. if you say 100%, it means you need 1 vcore. if you say 200%, it means you need two vcores.
          4. yarn-site.xml contains a config in yarn.nodemanager.linux-container-executor, or yarn.scheduler that toggles whether to hard-limit ceilings (don't give excess capacity).

          The second item (pcore/vcore ratio) would be required so that users can reason about the speed of a core in a heterogenous hardware environment.

          Thoughts?

          Show
          criccomini Chris Riccomini added a comment - Hey Sandy, Thinking about this more, relative to YARN-782 . What about: 1. yarn-site.xml contains NM's physical cores 2. yarn-site.xml contains NM's physical core:virtual core ratio 3. container resource request is % of a single virtual core that's needed. if you say 100%, it means you need 1 vcore. if you say 200%, it means you need two vcores. 4. yarn-site.xml contains a config in yarn.nodemanager.linux-container-executor, or yarn.scheduler that toggles whether to hard-limit ceilings (don't give excess capacity). The second item (pcore/vcore ratio) would be required so that users can reason about the speed of a core in a heterogenous hardware environment. Thoughts?
          Hide
          sandyr Sandy Ryza added a comment -

          OK, I understand why my original solution isn't sufficient, and why pcore-vcore-ratio may be needed in clusters with heterogeneous hardware. I'll think about this a little more and get back. Assigning this to myself, but feel free to steal it if you were planning to work on it.

          Show
          sandyr Sandy Ryza added a comment - OK, I understand why my original solution isn't sufficient, and why pcore-vcore-ratio may be needed in clusters with heterogeneous hardware. I'll think about this a little more and get back. Assigning this to myself, but feel free to steal it if you were planning to work on it.
          Hide
          sandyr Sandy Ryza added a comment -

          I've thought about this a little more. What seems preferable to me is to still get rid of the vcore-pcore ratio, but to have a configurable max CPU percent (which can be greater than 100) that YARN processes may take up in total, and to also have the yarn.scheduler.enforce-ceiling-vcores that you suggest. If enforcement is on, each container's cpu.cfs_quota_us would be configured in such a way that it receives (container's allocated vcores/yarn.nodemanager.cpu-ceiling-percent) of the machine's total CPU. Does that make sense? I can justify further why I think we should remove the vcore-pcore ratio if that would be helpful.

          Show
          sandyr Sandy Ryza added a comment - I've thought about this a little more. What seems preferable to me is to still get rid of the vcore-pcore ratio, but to have a configurable max CPU percent (which can be greater than 100) that YARN processes may take up in total, and to also have the yarn.scheduler.enforce-ceiling-vcores that you suggest. If enforcement is on, each container's cpu.cfs_quota_us would be configured in such a way that it receives (container's allocated vcores/yarn.nodemanager.cpu-ceiling-percent) of the machine's total CPU. Does that make sense? I can justify further why I think we should remove the vcore-pcore ratio if that would be helpful.
          Hide
          sandyr Sandy Ryza added a comment -

          Sorry, the ratio in my last comment should be (container's allocated vcores * yarn.nodemanager.cpu-ceiling-percent) / yarn.nodemanager.cpu-vcores

          Show
          sandyr Sandy Ryza added a comment - Sorry, the ratio in my last comment should be (container's allocated vcores * yarn.nodemanager.cpu-ceiling-percent) / yarn.nodemanager.cpu-vcores
          Hide
          criccomini Chris Riccomini added a comment -

          Hey Sandy,

          I'm with you up until this line:

          If enforcement is on, each container's cpu.cfs_quota_us would be configured in such a way that it receives (container's allocated vcores/yarn.nodemanager.cpu-ceiling-percent) of the machine's total CPU.

          Maybe a concrete example is more clear?

          Cheers,
          Chris

          Show
          criccomini Chris Riccomini added a comment - Hey Sandy, I'm with you up until this line: If enforcement is on, each container's cpu.cfs_quota_us would be configured in such a way that it receives (container's allocated vcores/yarn.nodemanager.cpu-ceiling-percent) of the machine's total CPU. Maybe a concrete example is more clear? Cheers, Chris
          Hide
          criccomini Chris Riccomini added a comment -

          Hey Sandy,

          What about just:

          yarn.nodemanager.cpu-total-hz
          yarn.scheduler.cpu-enforce-ceiling
          

          Containers the just request the hz that they need. If yarn.scheduler.cpu-enforce-ceiling is on, then:

          cpu.cfs_quota_us=(container hz request / yarn.nodemanager.cpu-total-hz) * 1000000
          cpu.cfs_period_us=1000000
          
          Show
          criccomini Chris Riccomini added a comment - Hey Sandy, What about just: yarn.nodemanager.cpu-total-hz yarn.scheduler.cpu-enforce-ceiling Containers the just request the hz that they need. If yarn.scheduler.cpu-enforce-ceiling is on, then: cpu.cfs_quota_us=(container hz request / yarn.nodemanager.cpu-total-hz) * 1000000 cpu.cfs_period_us=1000000
          Hide
          sandyr Sandy Ryza added a comment -

          Does it make more sense with the updated (correct) ratio I posted? The idea is that the configured maximum cpu percent is split evenly among a node's configured vcores.

          Show
          sandyr Sandy Ryza added a comment - Does it make more sense with the updated (correct) ratio I posted? The idea is that the configured maximum cpu percent is split evenly among a node's configured vcores.
          Hide
          criccomini Chris Riccomini added a comment -

          Hey Sandy,

          It's unclear to me why we'd need vcores if NMs have a max cpu percent (e.g. 100% = 1ghz). In such a case, why bother with vcores? Containers could just request the actual percent (or hz) that they need directly. This is essentially what I'm proposing in the comment above (the cpu-total-hz one).

          Cheers,
          Chris

          Show
          criccomini Chris Riccomini added a comment - Hey Sandy, It's unclear to me why we'd need vcores if NMs have a max cpu percent (e.g. 100% = 1ghz). In such a case, why bother with vcores? Containers could just request the actual percent (or hz) that they need directly. This is essentially what I'm proposing in the comment above (the cpu-total-hz one). Cheers, Chris
          Hide
          sandyr Sandy Ryza added a comment -

          Regarding, hz vs. virtual cores, there was some discussion about this on YARN-2. I think the worry about using hz is that the actual performance a core compared to its hz can vary significantly between different CPU architectures. The thought was that we would be able to standardize by declaring that a vcore is equivalent to, say, a 1Ghz Intel Xeon 2010 core, and then figure out where other processors stand relative to this.

          I also think it might be too late to make the change given the proximity to releasing 2.1.0-beta.

          Show
          sandyr Sandy Ryza added a comment - Regarding, hz vs. virtual cores, there was some discussion about this on YARN-2 . I think the worry about using hz is that the actual performance a core compared to its hz can vary significantly between different CPU architectures. The thought was that we would be able to standardize by declaring that a vcore is equivalent to, say, a 1Ghz Intel Xeon 2010 core, and then figure out where other processors stand relative to this. I also think it might be too late to make the change given the proximity to releasing 2.1.0-beta.
          Hide
          criccomini Chris Riccomini added a comment -

          Hey Sandy,

          Ah ha. Fair enough. But, then I think I'm confused about your proposal. The thing that's jamming me up is that you suggest removing the vcore-to-pcore ratio. Without that, it's unclear to me what a vcore is, or its relation to pcores. It's also unclear to me what yarn.nodemanager.cpu-ceiling-percent is.

          Basically, I'm confused.

          Cheers,
          Chris

          Show
          criccomini Chris Riccomini added a comment - Hey Sandy, Ah ha. Fair enough. But, then I think I'm confused about your proposal. The thing that's jamming me up is that you suggest removing the vcore-to-pcore ratio. Without that, it's unclear to me what a vcore is, or its relation to pcores. It's also unclear to me what yarn.nodemanager.cpu-ceiling-percent is. Basically, I'm confused. Cheers, Chris
          Hide
          sandyr Sandy Ryza added a comment -

          Chris and I chatted about this offline to get on the same page. Here's what we settled on (Chris, correct me if I'm mispresenting):
          After this JIRA and YARN-810, the NodeManager will have two settings: yarn.nodemanager.resource.cpu-vcores, which takes an integer and yarn.nodemanager.cpu-enforce-ceiling, which takes a boolean.

          We set the following for every container:
          cpu.shares=CPU_DEFAULT_WHATEVER * <requested vcores>
          If enforce ceiling is on, we also set
          cpu.cfs_quota_us=(<requested vcores>/yarn.nodemanager.resource.cpu-vcores)*PERIOD
          cpu.cfs_period_us=PERIOD

          where CPU_DEFAULT_WHATEVER and PERIOD are constants with values something like 1024 and 1000000

          Show
          sandyr Sandy Ryza added a comment - Chris and I chatted about this offline to get on the same page. Here's what we settled on (Chris, correct me if I'm mispresenting): After this JIRA and YARN-810 , the NodeManager will have two settings: yarn.nodemanager.resource.cpu-vcores, which takes an integer and yarn.nodemanager.cpu-enforce-ceiling, which takes a boolean. We set the following for every container: cpu.shares=CPU_DEFAULT_WHATEVER * <requested vcores> If enforce ceiling is on, we also set cpu.cfs_quota_us=(<requested vcores>/yarn.nodemanager.resource.cpu-vcores)*PERIOD cpu.cfs_period_us=PERIOD where CPU_DEFAULT_WHATEVER and PERIOD are constants with values something like 1024 and 1000000
          Hide
          tucu00 Alejandro Abdelnur added a comment -

          IMO the settings to enable ceiling should be:

          • yarn.nodemanager.cpu-enforce-ceiling.enabled=true|false. Enables disables ceiling enforcement in the NM.

          And an application should be able to specify if ceiling is required or not.

          Show
          tucu00 Alejandro Abdelnur added a comment - IMO the settings to enable ceiling should be: yarn.nodemanager.cpu-enforce-ceiling.enabled=true|false. Enables disables ceiling enforcement in the NM. And an application should be able to specify if ceiling is required or not.
          Hide
          criccomini Chris Riccomini added a comment -

          yarn.nodemanager.cpu-enforce-ceiling.enabled=true|false. Enables disables ceiling enforcement in the NM.

          I'm not too opinionated about the config name. The one you propose sounds good to me. I would encourage more docs, though:

          Enables disables ceiling enforcement in the NM. If set to true, containers will not be allowed to use excess CPU capacity beyond what was requested, even if it's available.

          And an application should be able to specify if ceiling is required or not.

          I agree. I think this is safe, so long as cpu.shares is always set appropriately. That is, I think it should be fine to inter-mingle procs with ceiling enforcement with those that don't have it. The CFS should take care of things as expected (provided cpu.shares is set appropriately for all tasks).

          Show
          criccomini Chris Riccomini added a comment - yarn.nodemanager.cpu-enforce-ceiling.enabled=true|false. Enables disables ceiling enforcement in the NM. I'm not too opinionated about the config name. The one you propose sounds good to me. I would encourage more docs, though: Enables disables ceiling enforcement in the NM. If set to true, containers will not be allowed to use excess CPU capacity beyond what was requested, even if it's available. And an application should be able to specify if ceiling is required or not. I agree. I think this is safe, so long as cpu.shares is always set appropriately. That is, I think it should be fine to inter-mingle procs with ceiling enforcement with those that don't have it. The CFS should take care of things as expected (provided cpu.shares is set appropriately for all tasks).
          Hide
          tucu00 Alejandro Abdelnur added a comment -

          yarn container cpu share are always (app.vcores * 1024) so this value can be set as the cpu.cfs_quota_us for the container process, the NM knows it total number of vcores so it can set cpu.cfs_period_us=(total.vcores * 1024).

          Show
          tucu00 Alejandro Abdelnur added a comment - yarn container cpu share are always (app.vcores * 1024) so this value can be set as the cpu.cfs_quota_us for the container process, the NM knows it total number of vcores so it can set cpu.cfs_period_us=(total.vcores * 1024).
          Hide
          revans2 Robert Joseph Evans added a comment -

          Sorry I am a bit late to this discussion. I don't like the config to be global. I think it needs to be on a per container basis.

          There are certain cases where this is desirable. There are also certain cases where it might be desirable to have a hard limit on CPU usage, and not allow the process to go above the specified resource requirement, even if it's available.

          The question is are there ever two different applications running on the same cluster where it is desirable for one, and not for the other. I believe that is true. I argued this in YARN-102 where you want to measure how long an application will take to run under a specific CPU resource request. If I allow it to go over I will never know how long it would take worst case, and so I will never know if my config is correct unless I can artificially limit it. But in production I don't want to run worst case every time, and I don't want a special test cluster to see what the worst case is.

          Show
          revans2 Robert Joseph Evans added a comment - Sorry I am a bit late to this discussion. I don't like the config to be global. I think it needs to be on a per container basis. There are certain cases where this is desirable. There are also certain cases where it might be desirable to have a hard limit on CPU usage, and not allow the process to go above the specified resource requirement, even if it's available. The question is are there ever two different applications running on the same cluster where it is desirable for one, and not for the other. I believe that is true. I argued this in YARN-102 where you want to measure how long an application will take to run under a specific CPU resource request. If I allow it to go over I will never know how long it would take worst case, and so I will never know if my config is correct unless I can artificially limit it. But in production I don't want to run worst case every time, and I don't want a special test cluster to see what the worst case is.
          Hide
          tucu00 Alejandro Abdelnur added a comment -

          Roger Evans, "And an application should be able to specify if ceiling is required or not."

          Show
          tucu00 Alejandro Abdelnur added a comment - Roger Evans , "And an application should be able to specify if ceiling is required or not."
          Hide
          ywskycn Wei Yan added a comment -

          Upload a patch for review.
          (1) Add a configuration field cpu_enforce_ceiling_enabled to the ApplicationSubmissionContext. Each application can set this field to true (default is false) if it wants cpu ceiling enforcement.
          (2) RM will notify the list of containers with cpu_enforce_ceiling_enabled with NM through heartbeat. The heartbeat responsem message contains a list of containerIds which are launched at current node and with ceiling enabled.
          (3) The CgroupsLCEResource will set the cpu.cfs_period_us and cpu.cfs_quota_us for containers with ceiling enabled.
          (4) Update the distributed shell example to include the cpu_enforce_ceiling_enabled configuration, so we can test this feature using distributedshell.

          Show
          ywskycn Wei Yan added a comment - Upload a patch for review. (1) Add a configuration field cpu_enforce_ceiling_enabled to the ApplicationSubmissionContext. Each application can set this field to true (default is false) if it wants cpu ceiling enforcement. (2) RM will notify the list of containers with cpu_enforce_ceiling_enabled with NM through heartbeat. The heartbeat responsem message contains a list of containerIds which are launched at current node and with ceiling enabled. (3) The CgroupsLCEResource will set the cpu.cfs_period_us and cpu.cfs_quota_us for containers with ceiling enabled. (4) Update the distributed shell example to include the cpu_enforce_ceiling_enabled configuration, so we can test this feature using distributedshell.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12656584/YARN-810.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
          org.apache.hadoop.yarn.util.TestFSDownload
          org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesContainers
          org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices
          org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesApps
          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices
          org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.TestRMContainerImpl
          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched
          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens
          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes
          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4364//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4364//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12656584/YARN-810.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell org.apache.hadoop.yarn.util.TestFSDownload org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesContainers org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.TestRMContainerImpl org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4364//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4364//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12656675/YARN-810.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
          org.apache.hadoop.yarn.util.TestFSDownload

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4369//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4369//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12656675/YARN-810.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell org.apache.hadoop.yarn.util.TestFSDownload +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4369//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4369//console This message is automatically generated.
          Hide
          vvasudev Varun Vasudev added a comment -

          Sandy Ryza Wei Yan are you still working on this? If not, I'd like to pick it up.

          Show
          vvasudev Varun Vasudev added a comment - Sandy Ryza Wei Yan are you still working on this? If not, I'd like to pick it up.
          Hide
          ywskycn Wei Yan added a comment -

          Varun Vasudev, thanks for the offer. I'm still working on this.

          Show
          ywskycn Wei Yan added a comment - Varun Vasudev , thanks for the offer. I'm still working on this.
          Hide
          vvasudev Varun Vasudev added a comment -

          Wei Yan thanks for letting me know! Some comments on your patch -

          1. In CgroupsLCEResourcesHandler.java, you set cfs_period_us to nmShares and cfs_quota_us to cpuShares. From the RedHat documentation, cfs_period_us and cfs_quota_us operate on a CPU basis. From the documentation

          Note that the quota and period parameters operate on a CPU basis. To allow a process to fully utilize two CPUs, for example, set cpu.cfs_quota_us to 200000 and cpu.cfs_period_us to 100000.

          With your current implementation, on a machine with 4 cores(and 4 vcores), a container which requests 2 vcores will have cfs_period_us set to 4096 and cfs_quota_us set to 2048 which will end up limiting it to 50% of one CPU. Is my understanding wrong?

          2. This is just nitpicking, but is it possible to change CpuEnforceCeilingEnabled(and its variants) to just CpuCeilingEnabled or CpuCeilingEnforced?

          Show
          vvasudev Varun Vasudev added a comment - Wei Yan thanks for letting me know! Some comments on your patch - 1. In CgroupsLCEResourcesHandler.java, you set cfs_period_us to nmShares and cfs_quota_us to cpuShares. From the RedHat documentation, cfs_period_us and cfs_quota_us operate on a CPU basis. From the documentation Note that the quota and period parameters operate on a CPU basis. To allow a process to fully utilize two CPUs, for example, set cpu.cfs_quota_us to 200000 and cpu.cfs_period_us to 100000. With your current implementation, on a machine with 4 cores(and 4 vcores), a container which requests 2 vcores will have cfs_period_us set to 4096 and cfs_quota_us set to 2048 which will end up limiting it to 50% of one CPU. Is my understanding wrong? 2. This is just nitpicking, but is it possible to change CpuEnforceCeilingEnabled(and its variants) to just CpuCeilingEnabled or CpuCeilingEnforced?
          Hide
          ywskycn Wei Yan added a comment -

          With your current implementation, on a machine with 4 cores(and 4 vcores), a container which requests 2 vcores will have cfs_period_us set to 4096 and cfs_quota_us set to 2048 which will end up limiting it to 50% of one CPU. Is my understanding wrong?

          Thanks, Varun Vasudev. I mentioned this problem after reading your YARN-2420 patch. I'll double check this problem, and will update the patch.

          Show
          ywskycn Wei Yan added a comment - With your current implementation, on a machine with 4 cores(and 4 vcores), a container which requests 2 vcores will have cfs_period_us set to 4096 and cfs_quota_us set to 2048 which will end up limiting it to 50% of one CPU. Is my understanding wrong? Thanks, Varun Vasudev . I mentioned this problem after reading your YARN-2420 patch. I'll double check this problem, and will update the patch.
          Hide
          ywskycn Wei Yan added a comment -

          Varun Vasudev, for the cfs_quota_us and cfs_period_us settings problem, as we need to get the number of physical cores used by YARN, I'll update a patch here once your YARN-2440 committed.

          Show
          ywskycn Wei Yan added a comment - Varun Vasudev , for the cfs_quota_us and cfs_period_us settings problem, as we need to get the number of physical cores used by YARN, I'll update a patch here once your YARN-2440 committed.
          Hide
          beckham007 Beckham007 added a comment -

          Hi, Wei Yan and Varun Vasudev. Both this issue and YARN-2440 are doing cpu core isolation for containers. In our production cluster, if the number of vcore is more than pcore, the nm will "crash"(The system processes couldn't get cpu time). So these issues are worthy.
          But using cfs_quota_us and cfs_period_us makes too many changes in LCE, even we have modify ContainerLauche, I think cpu/memory/diskio could be the first class for resource isolation. But cfs_quota_us and cfs_period_us should be second.
          I also think refactoring the LCE to support more cgroups subsystems, as YARN-2139 and YARN-2140. In this case, we could use cpuset for cpu core isolation.

          Show
          beckham007 Beckham007 added a comment - Hi, Wei Yan and Varun Vasudev . Both this issue and YARN-2440 are doing cpu core isolation for containers. In our production cluster, if the number of vcore is more than pcore, the nm will "crash"(The system processes couldn't get cpu time). So these issues are worthy. But using cfs_quota_us and cfs_period_us makes too many changes in LCE, even we have modify ContainerLauche, I think cpu/memory/diskio could be the first class for resource isolation. But cfs_quota_us and cfs_period_us should be second. I also think refactoring the LCE to support more cgroups subsystems, as YARN-2139 and YARN-2140 . In this case, we could use cpuset for cpu core isolation.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12656675/YARN-810.patch
          against trunk revision d71d40a.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5515//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12656675/YARN-810.patch against trunk revision d71d40a. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5515//console This message is automatically generated.
          Hide
          ywskycn Wei Yan added a comment -

          Rebase a new patch for review.

          Show
          ywskycn Wei Yan added a comment - Rebase a new patch for review.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12688757/YARN-810-3.patch
          against trunk revision fdf042d.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 8 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 52 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6170//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6170//artifact/patchprocess/newPatchFindbugsWarningshadoop-sls.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6170//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6170//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6170//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6170//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-applications-distributedshell.html
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6170//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12688757/YARN-810-3.patch against trunk revision fdf042d. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 8 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 52 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6170//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6170//artifact/patchprocess/newPatchFindbugsWarningshadoop-sls.html Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6170//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6170//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6170//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6170//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-applications-distributedshell.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6170//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12688764/YARN-810-4.patch
          against trunk revision fdf042d.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 8 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 52 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6171//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6171//artifact/patchprocess/newPatchFindbugsWarningshadoop-sls.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6171//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-applications-distributedshell.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6171//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6171//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6171//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6171//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12688764/YARN-810-4.patch against trunk revision fdf042d. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 8 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 52 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6171//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6171//artifact/patchprocess/newPatchFindbugsWarningshadoop-sls.html Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6171//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-applications-distributedshell.html Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6171//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6171//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6171//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6171//console This message is automatically generated.
          Hide
          ywskycn Wei Yan added a comment -

          Update a new patch which encodes the per-app cpu_ceiling_enforce setting in container's token.

          Show
          ywskycn Wei Yan added a comment - Update a new patch which encodes the per-app cpu_ceiling_enforce setting in container's token.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12689009/YARN-810-5.patch
          against trunk revision 4f18018.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 15 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.TestFifoScheduler
          org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation
          org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6187//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6187//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6187//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12689009/YARN-810-5.patch against trunk revision 4f18018. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 15 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestFifoScheduler org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6187//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6187//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6187//console This message is automatically generated.
          Hide
          ywskycn Wei Yan added a comment -

          Update a patch to fix the test failures.

          Show
          ywskycn Wei Yan added a comment - Update a patch to fix the test failures.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12691002/YARN-810-6.patch
          against trunk revision ae91b13.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 8 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6286//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6286//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12691002/YARN-810-6.patch against trunk revision ae91b13. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 8 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6286//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6286//console This message is automatically generated.
          Hide
          ywskycn Wei Yan added a comment -

          The latest TestResourceLocalizationService failure is unrelated, and it passed locally. Karthik Kambatla, Varun Vasudev, could u guys help look the latest patch?

          Show
          ywskycn Wei Yan added a comment - The latest TestResourceLocalizationService failure is unrelated, and it passed locally. Karthik Kambatla , Varun Vasudev , could u guys help look the latest patch?
          Hide
          gtCarrera9 Li Lu added a comment -

          The latest patch does not apply on trunk. For now I'm canceling this patch. Sandy Ryza it would be great if you can update the patch. Thank you!

          Show
          gtCarrera9 Li Lu added a comment - The latest patch does not apply on trunk. For now I'm canceling this patch. Sandy Ryza it would be great if you can update the patch. Thank you!
          Hide
          kasha Karthik Kambatla added a comment -

          Hasn't this been recently added with strict cpu usage?

          Show
          kasha Karthik Kambatla added a comment - Hasn't this been recently added with strict cpu usage?

            People

            • Assignee:
              ywskycn Wei Yan
              Reporter:
              criccomini Chris Riccomini
            • Votes:
              0 Vote for this issue
              Watchers:
              48 Start watching this issue

              Dates

              • Created:
                Updated:

                Development