Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4931

Preempted resources go back to the same application

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.7.2
    • None
    • fairscheduler
    • None

    Description

      Sometimes a queue that needs resources causes preemption - but the preempted containers are just allocated right back to the application that just released them!

      Here is a tiny application (0007) that wants resources, and a container is preempted from application 0002 to satisfy it:

      2016-04-07 21:08:13,463 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler (FairSchedulerUpdateThread): Should preempt <memory:448, vCores:0> res for queue root.default: resDueToMinShare = <memory:0, vCores:0>, resDueToFairShare = <memory:448, vCores:0>
      2016-04-07 21:08:13,463 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler (FairSchedulerUpdateThread): Preempting container (prio=1res=<memory:15264, vCores:1>) from queue root.milesc
      2016-04-07 21:08:13,463 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics (FairSchedulerUpdateThread): Non-AM container preempted, current appAttemptId=appattempt_1460047303577_0002_000001, containerId=container_1460047303577_0002_01_001038, resource=<memory:15264, vCores:1>
      2016-04-07 21:08:13,463 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl (FairSchedulerUpdateThread): container_1460047303577_0002_01_001038 Container Transitioned from RUNNING to KILLED
      

      But then a moment later, application 00002 gets the container right back:

      2016-04-07 21:08:13,844 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode (ResourceManager Event Processor): Assigned container container_1460047303577_0002_01_001039 of capacity <memory:15264, vCores:1> on host ip-10-12-40-63.us-west-2.compute.internal:8041, which has 13 containers, <memory:241248, vCores:18> used and <memory:416, vCores:46> available after allocation
      2016-04-07 21:08:14,555 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl (IPC Server handler 59 on 8030): container_1460047303577_0002_01_001039 Container Transitioned from ALLOCATED to ACQUIRED
      2016-04-07 21:08:14,845 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl (ResourceManager Event Processor): container_1460047303577_0002_01_001039 Container Transitioned from ACQUIRED to RUNNING
      

      This results in new applications being unable to even get an AM, and never starting at all.

      Attachments

        1. resourcemanager.log
          86 kB
          Miles Crawford

        Issue Links

          Activity

            People

              Unassigned Unassigned
              milesc Miles Crawford
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated: