Hadoop YARN
  1. Hadoop YARN
  2. YARN-201

CapacityScheduler can take a very long time to schedule containers if requests are off cluster

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.23.3, 2.0.1-alpha
    • Fix Version/s: 2.0.3-alpha, 0.23.5
    • Component/s: capacityscheduler
    • Labels:
      None

      Description

      When a user runs a job where one of the input files is a large file on another cluster, the job can create many splits on nodes which are unreachable for computation from the current cluster. The off-switch delay logic in LeafQueue can cause the ResourceManager to allocate containers for the job very slowly. In one case the job was only getting one container every 23 seconds, and the queue had plenty of spare capacity.

      1. YARN-201.patch
        4 kB
        Jason Lowe
      2. YARN-201.patch
        4 kB
        Jason Lowe

        Activity

        Hide
        Jason Lowe added a comment -

        I looked into why 1.x jobs don't have this issue. I believe it's because of how JobInProgress is resetting scheduling opportunities. When it allocates a node-local or rack-local task it resets the opportunities, but if it allocates a non-local task then it explicitly avoids resetting the opportunities, with a comment stating as such. YARN's CapacityScheduler resets the scheduling opportunities for any container allocated.

        So one potential fix is to emulate the 1.x behavior and not reset scheduling opportunities when we end up assigning a non-local container. Another is to have the CapacityScheduler track the active nodes/racks and filter requests for nodes/racks that aren't in that list when we normalize the ask list from the AM. This would keep us from waiting around for scheduling opportunities that will never help us meet locality for an ask that's off-cluster.

        Show
        Jason Lowe added a comment - I looked into why 1.x jobs don't have this issue. I believe it's because of how JobInProgress is resetting scheduling opportunities. When it allocates a node-local or rack-local task it resets the opportunities, but if it allocates a non-local task then it explicitly avoids resetting the opportunities, with a comment stating as such. YARN's CapacityScheduler resets the scheduling opportunities for any container allocated. So one potential fix is to emulate the 1.x behavior and not reset scheduling opportunities when we end up assigning a non-local container. Another is to have the CapacityScheduler track the active nodes/racks and filter requests for nodes/racks that aren't in that list when we normalize the ask list from the AM. This would keep us from waiting around for scheduling opportunities that will never help us meet locality for an ask that's off-cluster.
        Hide
        Jason Lowe added a comment -

        Thought a bit about filtering the AM's requests based on what resources are "active" (i.e.: nodes/racks we know about and are capable of launching containers), but it's a bit complicated with the latch-like protocol used between the AM and the RM. Definitely possible, just a bit complicated. In the interest of getting something working in this area sooner rather than later, here's a patch that emulates the behavior in 1.x where we don't reset the scheduling opportunities when allocating off-switch containers.

        Like 1.x, this still has an initial scheduling penalty for jobs that ask for containers on many off-cluster resources, but the job doesn't keep paying the penalty after each off-switch container is allocated. Ideally we shouldn't be paying the penalty at all, hence the idea of filtering locality requests based on what is capable of being local, but we can tackle that riskier proposition in another JIRA.

        Show
        Jason Lowe added a comment - Thought a bit about filtering the AM's requests based on what resources are "active" (i.e.: nodes/racks we know about and are capable of launching containers), but it's a bit complicated with the latch-like protocol used between the AM and the RM. Definitely possible, just a bit complicated. In the interest of getting something working in this area sooner rather than later, here's a patch that emulates the behavior in 1.x where we don't reset the scheduling opportunities when allocating off-switch containers. Like 1.x, this still has an initial scheduling penalty for jobs that ask for containers on many off-cluster resources, but the job doesn't keep paying the penalty after each off-switch container is allocated. Ideally we shouldn't be paying the penalty at all, hence the idea of filtering locality requests based on what is capable of being local, but we can tackle that riskier proposition in another JIRA.
        Hide
        Robert Joseph Evans added a comment -

        From a quick look the patch looks good. I only have one comment.

        // Reset scheduling opportunities if this was a localized request
        if (assignment.getType() != NodeType.OFF_SWITCH) {
          application.resetSchedulingOpportunities(priority);
        }
        

        Reset scheduling opportunities if this was a localized request

        seems unnecessary. The code is simpel enough that no comment is really needed but I would like to see a comment about why we don't reset them, so when someone else reads through the code they can understand why we are doing this, and update it accordingly.

        Show
        Robert Joseph Evans added a comment - From a quick look the patch looks good. I only have one comment. // Reset scheduling opportunities if this was a localized request if (assignment.getType() != NodeType.OFF_SWITCH) { application.resetSchedulingOpportunities(priority); } Reset scheduling opportunities if this was a localized request seems unnecessary. The code is simpel enough that no comment is really needed but I would like to see a comment about why we don't reset them, so when someone else reads through the code they can understand why we are doing this, and update it accordingly.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12552510/YARN-201.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/140//testReport/
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/140//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12552510/YARN-201.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/140//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/140//console This message is automatically generated.
        Hide
        Jason Lowe added a comment -

        Patch updated with a more verbose comment on why we're not resetting scheduling opportunities when assigning non-local containers.

        Show
        Jason Lowe added a comment - Patch updated with a more verbose comment on why we're not resetting scheduling opportunities when assigning non-local containers.
        Hide
        Arun C Murthy added a comment -

        +1 - I just committed this. Thanks for hunting this down Jason!

        Show
        Arun C Murthy added a comment - +1 - I just committed this. Thanks for hunting this down Jason!
        Hide
        Hudson added a comment -

        Integrated in Hadoop-trunk-Commit #2975 (See https://builds.apache.org/job/Hadoop-trunk-Commit/2975/)
        YARN-201. Fix CapacityScheduler to be less conservative for starved off-switch requests. Contributed by Jason Lowe. (Revision 1406834)

        Result = SUCCESS
        acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1406834
        Files :

        • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
        Show
        Hudson added a comment - Integrated in Hadoop-trunk-Commit #2975 (See https://builds.apache.org/job/Hadoop-trunk-Commit/2975/ ) YARN-201 . Fix CapacityScheduler to be less conservative for starved off-switch requests. Contributed by Jason Lowe. (Revision 1406834) Result = SUCCESS acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1406834 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Yarn-trunk #30 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/30/)
        YARN-201. Fix CapacityScheduler to be less conservative for starved off-switch requests. Contributed by Jason Lowe. (Revision 1406834)

        Result = SUCCESS
        acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1406834
        Files :

        • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
        Show
        Hudson added a comment - Integrated in Hadoop-Yarn-trunk #30 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/30/ ) YARN-201 . Fix CapacityScheduler to be less conservative for starved off-switch requests. Contributed by Jason Lowe. (Revision 1406834) Result = SUCCESS acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1406834 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-0.23-Build #429 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/429/)
        Merge -c 1406834 from trunk to branch-2 to fix YARN-201. Fix CapacityScheduler to be less conservative for starved off-switch requests. Contributed by Jason Lowe. (Revision 1406836)

        Result = SUCCESS
        acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1406836
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-yarn-project/CHANGES.txt
        • /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
        • /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #429 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/429/ ) Merge -c 1406834 from trunk to branch-2 to fix YARN-201 . Fix CapacityScheduler to be less conservative for starved off-switch requests. Contributed by Jason Lowe. (Revision 1406836) Result = SUCCESS acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1406836 Files : /hadoop/common/branches/branch-0.23/hadoop-yarn-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #1220 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1220/)
        YARN-201. Fix CapacityScheduler to be less conservative for starved off-switch requests. Contributed by Jason Lowe. (Revision 1406834)

        Result = SUCCESS
        acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1406834
        Files :

        • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1220 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1220/ ) YARN-201 . Fix CapacityScheduler to be less conservative for starved off-switch requests. Contributed by Jason Lowe. (Revision 1406834) Result = SUCCESS acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1406834 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1250 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1250/)
        YARN-201. Fix CapacityScheduler to be less conservative for starved off-switch requests. Contributed by Jason Lowe. (Revision 1406834)

        Result = FAILURE
        acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1406834
        Files :

        • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1250 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1250/ ) YARN-201 . Fix CapacityScheduler to be less conservative for starved off-switch requests. Contributed by Jason Lowe. (Revision 1406834) Result = FAILURE acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1406834 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java

          People

          • Assignee:
            Jason Lowe
            Reporter:
            Jason Lowe
          • Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development