Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-6956

preemption may only consider resource requests for one node

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 2.9.0, 3.0.0-beta1
    • None
    • fairscheduler
    • None
    • CDH 5.11.0

    Description

      I'm observing the following series of events on a CDH 5.11.0 cluster, which seem to be possible after YARN-6163:

      1. An application is considered to be starved, so FSPreemptionThread calls identifyContainersToPreempt, and that calls FSAppAttempt#getStarvedResourceRequests to get a list of ResourceRequest instances that are enough to address the app's starvation.

      2. The first ResourceRequest that getStarvedResourceRequests sees is enough to address the app's starvation, so we break out of the loop over appSchedulingInfo.getAllResourceRequests() after only one iteration: https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java#L1180. We return only this one ResourceRequest back to the identifyContainersToPreempt method.

      3. It turns out that this particular ResourceRequest happens to have a value for getResourceName that identifies a specific node in the cluster. This causes preemption to only consider containers on that node, and not the rest of the cluster.

      kasha, does that make sense? I'm happy to submit a patch if I'm understanding the problem correctly.

      Attachments

        1. YARN-6956.001.patch
          13 kB
          Steven Rand

        Issue Links

          Activity

            People

              Steven Rand Steven Rand
              Steven Rand Steven Rand
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: