Hadoop Common
  1. Hadoop Common
  2. HADOOP-4789

Change fair scheduler to share between pools by default, not between invidual jobs

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.20.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      Changed fair scheduler to divide resources equally between pools, not jobs.

      Description

      The fair scheduler currently treats jobs as equal entities in sharing by default, so that a user who submits 2 jobs gets 2x the share of a user who submits only 1 job. We found that it makes more sense to support equal shares between individual pools instead, and have one pool per user, because users can otherwise game the system by submitting multiple small jobs. This patch will set the scheduler to share between pools by default and set the default pool assignment process to one pool per user. it will also be possible to give weights to pools so that some users/groups/applications get a larger share of the cluster if they really do need to run more jobs.

        Activity

        Hide
        Matei Zaharia added a comment -

        Here is a patch for this issue with included documentation and a unit test. As mentioned in the description, the patch causes sharing to happen on a per-pool rather than per-job basis, with shares within the pool. It also sets the default poolnameproperty to user.name, i.e. one pool for each user (which seemed to be a reasonable setting in our use of the scheduler at Facebook). And finally, it allows giving pools different weights in the fair share using a weight element in the scheduler config XML.

        Show
        Matei Zaharia added a comment - Here is a patch for this issue with included documentation and a unit test. As mentioned in the description, the patch causes sharing to happen on a per-pool rather than per-job basis, with shares within the pool. It also sets the default poolnameproperty to user.name, i.e. one pool for each user (which seemed to be a reasonable setting in our use of the scheduler at Facebook). And finally, it allows giving pools different weights in the fair share using a weight element in the scheduler config XML.
        Hide
        Tom White added a comment -

        +1 Looks good to me.

        A minor nit: rather than refer to the Capacity Scheduler's Jira issue, I would point to the src/contrib/capacity-scheduler directory.

        Show
        Tom White added a comment - +1 Looks good to me. A minor nit: rather than refer to the Capacity Scheduler's Jira issue, I would point to the src/contrib/capacity-scheduler directory.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12395464/hadoop-4789-v1.patch
        against trunk revision 724883.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

        -1 core tests. The patch failed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3697/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3697/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3697/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3697/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12395464/hadoop-4789-v1.patch against trunk revision 724883. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3697/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3697/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3697/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3697/console This message is automatically generated.
        Hide
        Matei Zaharia added a comment -

        The failed tests seem to be unrelated to the fair scheduler, so I'm resubmitting the patch to go through Hudson.

        Show
        Matei Zaharia added a comment - The failed tests seem to be unrelated to the fair scheduler, so I'm resubmitting the patch to go through Hudson.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12395464/hadoop-4789-v1.patch
        against trunk revision 726129.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3742/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3742/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3742/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3742/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12395464/hadoop-4789-v1.patch against trunk revision 726129. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3742/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3742/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3742/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3742/console This message is automatically generated.
        Hide
        Matei Zaharia added a comment -

        Still seems to be unrelated tests failing , not sure what to do about it.

        Show
        Matei Zaharia added a comment - Still seems to be unrelated tests failing , not sure what to do about it.
        Hide
        Hemanth Yamijala added a comment -

        The JobTracker restart failures are being tracked in HADOOP-4716. We will work on fixing them soon.

        Show
        Hemanth Yamijala added a comment - The JobTracker restart failures are being tracked in HADOOP-4716 . We will work on fixing them soon.
        Hide
        dhruba borthakur added a comment -

        It would be nice to have this fix rolled into 0.19.

        Show
        dhruba borthakur added a comment - It would be nice to have this fix rolled into 0.19.
        Hide
        Matei Zaharia added a comment -

        Resubmitted patch for Hudson..

        Show
        Matei Zaharia added a comment - Resubmitted patch for Hudson..
        Hide
        dhruba borthakur added a comment -

        I think we should get this one committed.

        Show
        dhruba borthakur added a comment - I think we should get this one committed.
        Hide
        Matei Zaharia added a comment -

        I just committed this. The two failed tests were a deadlock in unrelated code due to https://issues.apache.org/jira/browse/HADOOP-4977 and a chukwa test which now passes. We have also been running this in production at Facebook.

        Show
        Matei Zaharia added a comment - I just committed this. The two failed tests were a deadlock in unrelated code due to https://issues.apache.org/jira/browse/HADOOP-4977 and a chukwa test which now passes. We have also been running this in production at Facebook.
        Hide
        Robert Chansler added a comment -

        Edit release note for publication.

        Show
        Robert Chansler added a comment - Edit release note for publication.

          People

          • Assignee:
            Matei Zaharia
            Reporter:
            Matei Zaharia
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development