Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5651

Backport Fair Scheduler queue placement policies to branch-1

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3.0
    • Component/s: scheduler
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      YARN-1392 introduced general policies for assigning applications to queues in the YARN fair scheduler. This functionality would be useful and minimally invasive in MR1 as well.

      1. MAPREDUCE-5651.patch
        29 kB
        Ted Malaska
      2. MAPREDUCE-5651.5.patch
        28 kB
        Ted Malaska
      3. MAPREDUCE-5651.4.patch
        28 kB
        Ted Malaska
      4. MAPREDUCE-5651.3.patch
        28 kB
        Ted Malaska
      5. MAPREDUCE-5651.2.patch
        28 kB
        Ted Malaska

        Activity

        Hide
        Ted Malaska added a comment -

        Thx Sandy, I will down load the code tonight and hopefully have something my thx giving.

        Show
        Ted Malaska added a comment - Thx Sandy, I will down load the code tonight and hopefully have something my thx giving.
        Hide
        Ted Malaska added a comment -

        Ported the code but not the documentation yet.

        M src/core/org/apache/hadoop/security/GroupMappingServiceProvider.java
        M src/contrib/fairscheduler/src/test/org/apache/hadoop/mapred/TestFairScheduler.java
        A src/contrib/fairscheduler/src/test/org/apache/hadoop/mapred/TestQueuePlacementPolicy.java
        M src/contrib/fairscheduler/src/java/org/apache/hadoop/mapred/PoolManager.java
        A src/contrib/fairscheduler/src/java/org/apache/hadoop/mapred/QueuePlacementRule.java
        A src/contrib/fairscheduler/src/java/org/apache/hadoop/mapred/SimpleGroupsMapping.java
        M src/contrib/fairscheduler/src/java/org/apache/hadoop/mapred/FairScheduler.java
        A src/contrib/fairscheduler/src/java/org/apache/hadoop/mapred/QueuePlacementPolicy.java

        Show
        Ted Malaska added a comment - Ported the code but not the documentation yet. M src/core/org/apache/hadoop/security/GroupMappingServiceProvider.java M src/contrib/fairscheduler/src/test/org/apache/hadoop/mapred/TestFairScheduler.java A src/contrib/fairscheduler/src/test/org/apache/hadoop/mapred/TestQueuePlacementPolicy.java M src/contrib/fairscheduler/src/java/org/apache/hadoop/mapred/PoolManager.java A src/contrib/fairscheduler/src/java/org/apache/hadoop/mapred/QueuePlacementRule.java A src/contrib/fairscheduler/src/java/org/apache/hadoop/mapred/SimpleGroupsMapping.java M src/contrib/fairscheduler/src/java/org/apache/hadoop/mapred/FairScheduler.java A src/contrib/fairscheduler/src/java/org/apache/hadoop/mapred/QueuePlacementPolicy.java
        Hide
        Sandy Ryza added a comment -

        Ted Malaska, the patch is looking good. A few comments:

        +
         public class TestFairScheduler extends TestCase {
        

        Unnecessary new line

        +
        +
        +    placementPolicyConfig.set("user.name", "user1");
        

        Only need one line here

        +    scheduler.getPoolManager().placementPolicy = new QueuePlacementPolicy(
        +        rules, queues, conf);
        

        Can we save the pool manager in a variable instead of calling scheduler.getPoolManager() each time we use it in the test.

        +  static {
        +    configuredQueues.add("someuser");
        +  }
        

        Put this in the BeforeClass method

           public static final String EXPLICIT_POOL_PROPERTY = "mapred.fairscheduler.pool";
        +  
        

        Unnecessary newline

        +    try {
        +      return placementPolicy.assignAppToQueue(queueName, user);
        +    } catch (IOException ioe) {
        +      return Pool.DEFAULT_POOL_NAME;
        +    }
        

        To mirror the YARN behavior, we should log an error and reject the app if we hit an IOException.

        +
        +      System.out.println("queue:" + queue + " " + rule);
        +
        

        Take this out.

        +  public static final String  USER_AS_DEFAULT_QUEUE = 
        +    "mapred.fairscheduler.user.as.default.queue";
        +  public static final boolean DEFAULT_USER_AS_DEFAULT_QUEUE = true;
        

        No need to add this property, which only exists in MR2, in MR1. Instead, if a pool placement policy isn't specified, we should fall back to the MR1 default, which is determining the pool from mapred.fairscheduler.poolnameproperty.

        MR1 refers to "pools" instead of "queues". We should change QueuePlacementRule to PoolPlacementRule, assignAppToQueue to assignAppToPool, the queuePlacementPolicy XML element to poolPlacementPolicy, etc.

        Show
        Sandy Ryza added a comment - Ted Malaska , the patch is looking good. A few comments: + public class TestFairScheduler extends TestCase { Unnecessary new line + + + placementPolicyConfig.set( "user.name" , "user1" ); Only need one line here + scheduler.getPoolManager().placementPolicy = new QueuePlacementPolicy( + rules, queues, conf); Can we save the pool manager in a variable instead of calling scheduler.getPoolManager() each time we use it in the test. + static { + configuredQueues.add( "someuser" ); + } Put this in the BeforeClass method public static final String EXPLICIT_POOL_PROPERTY = "mapred.fairscheduler.pool" ; + Unnecessary newline + try { + return placementPolicy.assignAppToQueue(queueName, user); + } catch (IOException ioe) { + return Pool.DEFAULT_POOL_NAME; + } To mirror the YARN behavior, we should log an error and reject the app if we hit an IOException. + + System .out.println( "queue:" + queue + " " + rule); + Take this out. + public static final String USER_AS_DEFAULT_QUEUE = + "mapred.fairscheduler.user.as. default .queue" ; + public static final boolean DEFAULT_USER_AS_DEFAULT_QUEUE = true ; No need to add this property, which only exists in MR2, in MR1. Instead, if a pool placement policy isn't specified, we should fall back to the MR1 default, which is determining the pool from mapred.fairscheduler.poolnameproperty. MR1 refers to "pools" instead of "queues". We should change QueuePlacementRule to PoolPlacementRule, assignAppToQueue to assignAppToPool, the queuePlacementPolicy XML element to poolPlacementPolicy, etc.
        Hide
        Ted Malaska added a comment -

        Made changes requested by Sandy.

        Show
        Ted Malaska added a comment - Made changes requested by Sandy.
        Hide
        Sandy Ryza added a comment -

        Thanks Ted. A few more comments:

        +    //CommonConfigurationKeys.HADOOP_SECURITY_GROUP_MAPPING,
        +    placementPolicyConfig.setClass("hadoop.security.group.mapping",
        

        Can take out the comment and just use CommonConfigurationKeys.HADOOP_SECURITY_GROUP_MAPPING as the argument?

        All mentions of "queue" should still be replaced with mentions of "pool".

        +    rules.add(new QueuePlacementRule.Specified().initialize(true, null));
        

        In simple placement rules, create param on the "specified" rule should only be true if allow undeclared pools property is true.

        Show
        Sandy Ryza added a comment - Thanks Ted. A few more comments: + //CommonConfigurationKeys.HADOOP_SECURITY_GROUP_MAPPING, + placementPolicyConfig.setClass( "hadoop.security.group.mapping" , Can take out the comment and just use CommonConfigurationKeys.HADOOP_SECURITY_GROUP_MAPPING as the argument? All mentions of "queue" should still be replaced with mentions of "pool". + rules.add( new QueuePlacementRule.Specified().initialize( true , null )); In simple placement rules, create param on the "specified" rule should only be true if allow undeclared pools property is true.
        Hide
        Ted Malaska added a comment -

        Will do now thanks sandy

        Show
        Ted Malaska added a comment - Will do now thanks sandy
        Hide
        Ted Malaska added a comment -

        Removed the commit and changed "queue" to "pool" and "app" to "job".

        Show
        Ted Malaska added a comment - Removed the commit and changed "queue" to "pool" and "app" to "job".
        Hide
        Sandy Ryza added a comment -

        Thanks Ted. getSimplePlacementRules still needs the change at the bottom of my last comment. After that, the patch looks good to me.

        Show
        Sandy Ryza added a comment - Thanks Ted. getSimplePlacementRules still needs the change at the bottom of my last comment. After that, the patch looks good to me.
        Hide
        Ted Malaska added a comment -

        Thank you Sandy. I will get on that now.

        Show
        Ted Malaska added a comment - Thank you Sandy. I will get on that now.
        Hide
        Ted Malaska added a comment -

        Updated the getSimplePlacementRules function to change the Specified create option based on ALLOW_UNDECLARED_POOLS_KEY

        Show
        Ted Malaska added a comment - Updated the getSimplePlacementRules function to change the Specified create option based on ALLOW_UNDECLARED_POOLS_KEY
        Hide
        Ted Malaska added a comment -

        Wait I want to make that call cleaner. One minute.

        Show
        Ted Malaska added a comment - Wait I want to make that call cleaner. One minute.
        Hide
        Ted Malaska added a comment -

        Did a little clean up

        Show
        Ted Malaska added a comment - Did a little clean up
        Hide
        Sandy Ryza added a comment -

        +1

        Show
        Sandy Ryza added a comment - +1
        Hide
        Sandy Ryza added a comment -

        One more thing: have you checked that all the tests in TestFairScheduler pass?

        Show
        Sandy Ryza added a comment - One more thing: have you checked that all the tests in TestFairScheduler pass?
        Hide
        Ted Malaska added a comment -

        Just ran it again to double check:

        /home/cloudera/svn/hadoop-branch-1/src/contrib/fairscheduler
        [root@localhost fairscheduler]# ant test
        Buildfile: build.xml

        check-contrib:

        init:
        [echo] contrib: fairscheduler

        init-contrib:

        ivy-download:
        [get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar
        [get] To: /home/cloudera/svn/hadoop-branch-1/ivy/ivy-2.1.0.jar
        [get] Not modified - so not downloaded

        ivy-probe-antlib:

        ivy-init-antlib:

        ivy-init:
        [ivy:configure] :: Ivy 2.1.0 - 20090925235825 :: http://ant.apache.org/ivy/ ::
        [ivy:configure] :: loading settings :: file = /home/cloudera/svn/hadoop-branch-1/ivy/ivysettings.xml

        ivy-resolve-common:

        ivy-retrieve-common:
        [ivy:cachepath] DEPRECATED: 'ivy.conf.file' is deprecated, use 'ivy.settings.file' instead
        [ivy:cachepath] :: loading settings :: file = /home/cloudera/svn/hadoop-branch-1/ivy/ivysettings.xml

        compile:
        [echo] contrib: fairscheduler
        [javac] Compiling 1 source file to /home/cloudera/svn/hadoop-branch-1/build/contrib/fairscheduler/classes
        [javac] Note: /home/cloudera/svn/hadoop-branch-1/src/contrib/fairscheduler/src/java/org/apache/hadoop/mapred/PoolManager.java uses or overrides a deprecated API.
        [javac] Note: Recompile with -Xlint:deprecation for details.

        compile-examples:

        compile-test:
        [echo] contrib: fairscheduler

        test:
        [echo] contrib: fairscheduler
        [delete] Deleting directory /home/cloudera/svn/hadoop-branch-1/build/contrib/fairscheduler/test/logs
        [mkdir] Created dir: /home/cloudera/svn/hadoop-branch-1/build/contrib/fairscheduler/test/logs
        [junit] Running org.apache.hadoop.mapred.TestCapBasedLoadManager
        [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.419 sec
        [junit] Running org.apache.hadoop.mapred.TestComputeFairShares
        [junit] Tests run: 10, Failures: 0, Errors: 0, Time elapsed: 0.108 sec
        [junit] Running org.apache.hadoop.mapred.TestFairScheduler
        [junit] Tests run: 39, Failures: 0, Errors: 0, Time elapsed: 9.944 sec
        [junit] Running org.apache.hadoop.mapred.TestFairSchedulerEventLog
        [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.26 sec
        [junit] Running org.apache.hadoop.mapred.TestFairSchedulerPoolNames
        [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 22.126 sec
        [junit] Running org.apache.hadoop.mapred.TestFairSchedulerSystem
        [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 45.947 sec
        [junit] Running org.apache.hadoop.mapred.TestPoolPlacementPolicy
        [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.133 sec

        checkfailure:

        BUILD SUCCESSFUL
        Total time: 1 minute 23 seconds

        Show
        Ted Malaska added a comment - Just ran it again to double check: /home/cloudera/svn/hadoop-branch-1/src/contrib/fairscheduler [root@localhost fairscheduler] # ant test Buildfile: build.xml check-contrib: init: [echo] contrib: fairscheduler init-contrib: ivy-download: [get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar [get] To: /home/cloudera/svn/hadoop-branch-1/ivy/ivy-2.1.0.jar [get] Not modified - so not downloaded ivy-probe-antlib: ivy-init-antlib: ivy-init: [ivy:configure] :: Ivy 2.1.0 - 20090925235825 :: http://ant.apache.org/ivy/ :: [ivy:configure] :: loading settings :: file = /home/cloudera/svn/hadoop-branch-1/ivy/ivysettings.xml ivy-resolve-common: ivy-retrieve-common: [ivy:cachepath] DEPRECATED: 'ivy.conf.file' is deprecated, use 'ivy.settings.file' instead [ivy:cachepath] :: loading settings :: file = /home/cloudera/svn/hadoop-branch-1/ivy/ivysettings.xml compile: [echo] contrib: fairscheduler [javac] Compiling 1 source file to /home/cloudera/svn/hadoop-branch-1/build/contrib/fairscheduler/classes [javac] Note: /home/cloudera/svn/hadoop-branch-1/src/contrib/fairscheduler/src/java/org/apache/hadoop/mapred/PoolManager.java uses or overrides a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. compile-examples: compile-test: [echo] contrib: fairscheduler test: [echo] contrib: fairscheduler [delete] Deleting directory /home/cloudera/svn/hadoop-branch-1/build/contrib/fairscheduler/test/logs [mkdir] Created dir: /home/cloudera/svn/hadoop-branch-1/build/contrib/fairscheduler/test/logs [junit] Running org.apache.hadoop.mapred.TestCapBasedLoadManager [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.419 sec [junit] Running org.apache.hadoop.mapred.TestComputeFairShares [junit] Tests run: 10, Failures: 0, Errors: 0, Time elapsed: 0.108 sec [junit] Running org.apache.hadoop.mapred.TestFairScheduler [junit] Tests run: 39, Failures: 0, Errors: 0, Time elapsed: 9.944 sec [junit] Running org.apache.hadoop.mapred.TestFairSchedulerEventLog [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.26 sec [junit] Running org.apache.hadoop.mapred.TestFairSchedulerPoolNames [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 22.126 sec [junit] Running org.apache.hadoop.mapred.TestFairSchedulerSystem [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 45.947 sec [junit] Running org.apache.hadoop.mapred.TestPoolPlacementPolicy [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.133 sec checkfailure: BUILD SUCCESSFUL Total time: 1 minute 23 seconds
        Hide
        Sandy Ryza added a comment -

        I just committed this to branch-1. Thanks Ted!

        Show
        Sandy Ryza added a comment - I just committed this to branch-1. Thanks Ted!
        Hide
        Ted Malaska added a comment -

        Thx u Sandy.

        Show
        Ted Malaska added a comment - Thx u Sandy.

          People

          • Assignee:
            Ted Malaska
            Reporter:
            Sandy Ryza
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development