Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4931

Preempted resources go back to the same application

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.7.2
    • Fix Version/s: None
    • Component/s: fairscheduler
    • Labels:
      None

      Description

      Sometimes a queue that needs resources causes preemption - but the preempted containers are just allocated right back to the application that just released them!

      Here is a tiny application (0007) that wants resources, and a container is preempted from application 0002 to satisfy it:

      2016-04-07 21:08:13,463 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler (FairSchedulerUpdateThread): Should preempt <memory:448, vCores:0> res for queue root.default: resDueToMinShare = <memory:0, vCores:0>, resDueToFairShare = <memory:448, vCores:0>
      2016-04-07 21:08:13,463 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler (FairSchedulerUpdateThread): Preempting container (prio=1res=<memory:15264, vCores:1>) from queue root.milesc
      2016-04-07 21:08:13,463 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics (FairSchedulerUpdateThread): Non-AM container preempted, current appAttemptId=appattempt_1460047303577_0002_000001, containerId=container_1460047303577_0002_01_001038, resource=<memory:15264, vCores:1>
      2016-04-07 21:08:13,463 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl (FairSchedulerUpdateThread): container_1460047303577_0002_01_001038 Container Transitioned from RUNNING to KILLED
      

      But then a moment later, application 00002 gets the container right back:

      2016-04-07 21:08:13,844 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode (ResourceManager Event Processor): Assigned container container_1460047303577_0002_01_001039 of capacity <memory:15264, vCores:1> on host ip-10-12-40-63.us-west-2.compute.internal:8041, which has 13 containers, <memory:241248, vCores:18> used and <memory:416, vCores:46> available after allocation
      2016-04-07 21:08:14,555 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl (IPC Server handler 59 on 8030): container_1460047303577_0002_01_001039 Container Transitioned from ALLOCATED to ACQUIRED
      2016-04-07 21:08:14,845 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl (ResourceManager Event Processor): container_1460047303577_0002_01_001039 Container Transitioned from ACQUIRED to RUNNING
      

      This results in new applications being unable to even get an AM, and never starting at all.

        Attachments

        1. resourcemanager.log
          86 kB
          Miles Crawford

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                milesc Miles Crawford
              • Votes:
                0 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated: