Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-546

Provide sample fair scheduler config file in conf/ and use it by default if no other config file is specified

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The capacity scheduler includes a config file template in hadoop/conf, so it would make sense to create a similar one for the fair scheduler and mention it in the README.

      1. mapreduce-546.patch
        16 kB
        Matei Zaharia
      2. mapreduce-546-v1.patch
        15 kB
        Matei Zaharia

        Activity

        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #29 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/29/)

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #29 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/29/ )
        Hide
        Matei Zaharia added a comment -

        I've committed this. Thanks for the review, Tom!

        Show
        Matei Zaharia added a comment - I've committed this. Thanks for the review, Tom!
        Hide
        Tom White added a comment -

        +1

        I think having a configuration that works like the other configuration in Hadoop is a win for users, even if it is more verbose. And you would get improvements like HADOOP-5670 for free. But this is a discussion for another Jira - the current patch is fine for now.

        Show
        Tom White added a comment - +1 I think having a configuration that works like the other configuration in Hadoop is a win for users, even if it is more verbose. And you would get improvements like HADOOP-5670 for free. But this is a discussion for another Jira - the current patch is fine for now.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12412768/mapreduce-546-v1.patch
        against trunk revision 791909.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/365/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/365/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/365/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/365/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12412768/mapreduce-546-v1.patch against trunk revision 791909. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/365/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/365/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/365/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/365/console This message is automatically generated.
        Hide
        Matei Zaharia added a comment -

        I've thought a bit about using a Configuration for the pools file, but I'm afraid it would become very verbose. A config file like this:

        <allocations>
          <pool name="ads">
             <minMaps>10</minMaps>
             <minReduces>5</minReduces>
             <minSharePreemptionTimeout>300</minSharePreemptionTimeout>
          </pool>
          <user name="bob">
             <maxRunningJobs>2</maxRunningJobs>
          </user>
        </allocations>
        

        Would become something like this:

        <configuration>
          <property>
             <name>mapred.fairscheduler.pool.ads.minMaps</name>
             <value>10</value>
          </property>
          <property>
             <name>mapred.fairscheduler.pool.ads.minReduces</name>
             <value>5</value>
          </property>
          <property>
             <name>mapred.fairscheduler.pool.ads.minSharePreemptionTimeout</name>
             <value>600</value>
          </property>
          <property>
             <name>mapred.fairscheduler.user.bob.maxRunningJobs</name>
             <value>2</value>
          </property>
        </configuration>
        

        I find the first one more readable and more maintainable, especially as properties relating to the same pool are grouped together. Do you think the code reuse benefits of using Configuration outweigh the loss in usability? I actually don't think the configuration reading code will be much smaller using Configuration because we'd have to parse string names like mapred.fairscheduler.pool.ads.minReduces instead of parsing XML.

        Show
        Matei Zaharia added a comment - I've thought a bit about using a Configuration for the pools file, but I'm afraid it would become very verbose. A config file like this: <allocations> <pool name= "ads" > <minMaps>10</minMaps> <minReduces>5</minReduces> <minSharePreemptionTimeout>300</minSharePreemptionTimeout> </pool> <user name= "bob" > <maxRunningJobs>2</maxRunningJobs> </user> </allocations> Would become something like this: <configuration> <property> <name>mapred.fairscheduler.pool.ads.minMaps</name> <value>10</value> </property> <property> <name>mapred.fairscheduler.pool.ads.minReduces</name> <value>5</value> </property> <property> <name>mapred.fairscheduler.pool.ads.minSharePreemptionTimeout</name> <value>600</value> </property> <property> <name>mapred.fairscheduler.user.bob.maxRunningJobs</name> <value>2</value> </property> </configuration> I find the first one more readable and more maintainable, especially as properties relating to the same pool are grouped together. Do you think the code reuse benefits of using Configuration outweigh the loss in usability? I actually don't think the configuration reading code will be much smaller using Configuration because we'd have to parse string names like mapred.fairscheduler.pool.ads.minReduces instead of parsing XML.
        Hide
        Matei Zaharia added a comment -

        Here's a new patch that uses Configuration.getResource instead of the classloader code.

        Show
        Matei Zaharia added a comment - Here's a new patch that uses Configuration.getResource instead of the classloader code.
        Hide
        Tom White added a comment -

        This looks reasonable to me. Could you use Configuration#getResource(String) or is there a classloader problem?

        Longer term, it would be nice if the fair scheduler could migrate to using a Hadoop Configuration just like the capacity scheduler does.

        Show
        Tom White added a comment - This looks reasonable to me. Could you use Configuration#getResource(String) or is there a classloader problem? Longer term, it would be nice if the fair scheduler could migrate to using a Hadoop Configuration just like the capacity scheduler does.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12412664/mapreduce-546.patch
        against trunk revision 791418.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/360/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/360/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/360/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/360/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12412664/mapreduce-546.patch against trunk revision 791418. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/360/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/360/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/360/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/360/console This message is automatically generated.
        Hide
        Matei Zaharia added a comment -

        Here's a patch for this issue that makes the scheduler use fair-scheduler.xml off the classpath if no other allocation file is specified through mapred.fairscheduler.allocation.file. (We keep this parameter for backwards compatibility).

        The only tricky part was using the URL from the ClassLoader's getResource method instead of a String for the path to the file. I followed the example set in Configuration of having an Object as the allocation file name that may be either an URL or a String, because I didn't want to force users of the existing jobconf parameter to supply a file:// URL, and I also didn't want to append file:// in front (this doesn't work with relative paths, and although the docs have said to use an absolute path since the scheduler was released, it could be confusing).

        I haven't included a unit test because the code changes are minor and I can't think of an easy way to unit test this. However, I did manually test that configurations are found whether the default config file fair-scheduler.xml is used or another file is specified through the config parameter, and also that the allocation file is reloaded at runtime in both situations.

        I've also updated the docs to reflect the new functionality and emptied-out the fair-scheduler.xml template so that it creates no pools by default. The docs cover all the features that were described in the old fair-scheduler.xml.template.

        Show
        Matei Zaharia added a comment - Here's a patch for this issue that makes the scheduler use fair-scheduler.xml off the classpath if no other allocation file is specified through mapred.fairscheduler.allocation.file. (We keep this parameter for backwards compatibility). The only tricky part was using the URL from the ClassLoader's getResource method instead of a String for the path to the file. I followed the example set in Configuration of having an Object as the allocation file name that may be either an URL or a String, because I didn't want to force users of the existing jobconf parameter to supply a file:// URL, and I also didn't want to append file:// in front (this doesn't work with relative paths, and although the docs have said to use an absolute path since the scheduler was released, it could be confusing). I haven't included a unit test because the code changes are minor and I can't think of an easy way to unit test this. However, I did manually test that configurations are found whether the default config file fair-scheduler.xml is used or another file is specified through the config parameter, and also that the allocation file is reloaded at runtime in both situations. I've also updated the docs to reflect the new functionality and emptied-out the fair-scheduler.xml template so that it creates no pools by default. The docs cover all the features that were described in the old fair-scheduler.xml.template.
        Hide
        Matei Zaharia added a comment -

        An update on this issue: MAPREDUCE-551 added a fair-scheduler.xml.template but didn't make the mapred.fairscheduler.allocation.file point to it by default. I'll try to figure out how to do that (mainly how to find the path of the conf dir from inside the fair scheduler).

        Show
        Matei Zaharia added a comment - An update on this issue: MAPREDUCE-551 added a fair-scheduler.xml.template but didn't make the mapred.fairscheduler.allocation.file point to it by default. I'll try to figure out how to do that (mainly how to find the path of the conf dir from inside the fair scheduler).

          People

          • Assignee:
            Matei Zaharia
            Reporter:
            Matei Zaharia
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development