Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-7290

Method canContainerBePreempted can return true when it shouldn't

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0-beta1
    • 3.0.0, 3.1.0
    • fairscheduler
    • None
    • Reviewed

    Description

      In FSAppAttempt#canContainerBePreempted, we make sure that preempting the given container would not put the app below its fair share:

          // Check if the app's allocation will be over its fairshare even
          // after preempting this container
          Resource usageAfterPreemption = Resources.clone(getResourceUsage());
      
          // Subtract resources of containers already queued for preemption
          synchronized (preemptionVariablesLock) {
            Resources.subtractFrom(usageAfterPreemption, resourcesToBePreempted);
          }
      
          // Subtract this container's allocation to compute usage after preemption
          Resources.subtractFrom(
              usageAfterPreemption, container.getAllocatedResource());
          return !isUsageBelowShare(usageAfterPreemption, getFairShare());
      

      However, this only considers one container in isolation, and fails to consider containers for the same app that we already added to preemptableContainers in FSPreemptionThread#identifyContainersToPreemptOnNode. Therefore we can have a case where we preempt multiple containers from the same app, none of which by itself puts the app below fair share, but which cumulatively do so.

      I've attached a patch with a test to show this behavior. The flow is:

      1. Initially greedyApp runs in root.preemptable.child-1 and is allocated all the resources (8g and 8vcores)
      2. Then starvingApp runs in root.preemptable.child-2 and requests 2 containers, each of which is 3g and 3vcores in size. At this point both greedyApp and starvingApp have a fair share of 4g (with DRF not in use).
      3. For the first container requested by starvedApp, we (correctly) preempt 3 containers from greedyApp, each of which is 1g and 1vcore.
      4. For the second container requested by starvedApp, we again (this time incorrectly) preempt 3 containers from greedyApp. This puts greedyApp below its fair share, but happens anyway because all six times that we call return !isUsageBelowShare(usageAfterPreemption, getFairShare());, the value of usageAfterPreemption is 7g and 7vcores (confirmed using debugger).

      So in addition to accounting for resourcesToBePreempted, we also need to account for containers that we're already planning on preempting in FSPreemptionThread#identifyContainersToPreemptOnNode.

      Attachments

        1. YARN-7290.005.patch
          13 kB
          Steven Rand
        2. YARN-7290.004.patch
          12 kB
          Steven Rand
        3. YARN-7290.003.patch
          11 kB
          Steven Rand
        4. YARN-7290.002.patch
          8 kB
          Steven Rand
        5. YARN-7290.001.patch
          8 kB
          Steven Rand
        6. YARN-7290-failing-test.patch
          2 kB
          Steven Rand

        Activity

          People

            Steven Rand Steven Rand
            Steven Rand Steven Rand
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: