Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-936

Allow a load difference in fairshare scheduler

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.1, 0.21.0, 0.22.0
    • Fix Version/s: 0.21.0
    • Component/s: contrib/fair-share
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Tags:
      fb

      Description

      The problem we are facing: It takes a long time for all tasks of a job to get scheduled on the cluster, even if the cluster is almost empty.

      There are two reasons that together lead to this situation:
      1. The load factor makes sure each TT runs the same number of tasks. (This is the part that this patch tries to change).

      2. The scheduler tries to schedule map tasks locally (first node-local, then rack-local). There is a wait time (mapred.fairscheduler.localitywait.node and mapred.fairscheduler.localitywait.rack, both are around 10 sec in our conf), and accumulated wait time (JobInfo.localityWait). The accumulated wait time is reset to 0 whenever a non-local map task is scheduled. That means it takes N * wait_time to schedule N non-local map tasks.

      Because of 1, a lot of TT will not be able to take more tasks, even if they have free slots. As a result, a lot of the map tasks cannot be scheduled locally.

      Because of 2, it's really hard to schedule a non-local task.

      As a result, sometimes we are seeing that it takes more than 2 minutes to schedule all the mappers of a job.

      1. MAPREDUCE-936.1.patch
        1 kB
        Zheng Shao
      2. MAPREDUCE-936.2.patch
        7 kB
        Zheng Shao

        Activity

        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #75 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/75/)

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #75 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/75/ )
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #16 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/16/)
        . Allow a load difference for fairshare scheduler.
        (Zheng Shao via dhruba)

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #16 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/16/ ) . Allow a load difference for fairshare scheduler. (Zheng Shao via dhruba)
        Hide
        dhruba borthakur added a comment -

        I just committed this. Thanks Zheng.

        Show
        dhruba borthakur added a comment - I just committed this. Thanks Zheng.
        Hide
        Matei Zaharia added a comment -

        +1 looks good, feel free to commit it. Thanks Zheng!

        Show
        Matei Zaharia added a comment - +1 looks good, feel free to commit it. Thanks Zheng!
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12418583/MAPREDUCE-936.2.patch
        against trunk revision 811134.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/38/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/38/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/38/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/38/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12418583/MAPREDUCE-936.2.patch against trunk revision 811134. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/38/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/38/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/38/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/38/console This message is automatically generated.
        Hide
        Zheng Shao added a comment -

        Added a unit test.

        Tested with

        ant test -Dtestcase=TestCapBasedLoadManager
        
        Show
        Zheng Shao added a comment - Added a unit test. Tested with ant test -Dtestcase=TestCapBasedLoadManager
        Hide
        dhruba borthakur added a comment -

        hi zheng, would appreciate it a lot if u can provide a unit test for this one. Thanks.

        Show
        dhruba borthakur added a comment - hi zheng, would appreciate it a lot if u can provide a unit test for this one. Thanks.
        Hide
        Matei Zaharia added a comment -

        Hi Zheng,

        For issue 1, the provided patch looks good. It might be nice to add a unit test for it though.

        For issue 2, I believe the implementation of locality waits in MAPREDUCE-706 has solved the issue. In that implementation, once a job has launched a non-local task, it can keep launching non-local tasks right away without further waits. However, if it ever manages to launch a local task again, it needs to wait to start launching non-local tasks. The reasoning for this is that maybe the job had just been unlucky earlier and still has lots of tasks left to launch, and we don't want it to stay stuck at the non-local level.

        I think the locality wait code you guys are running at Facebook is much older than the one in MAPREDUCE-706, so it would be nice if you could upgrade to MAPREDUCE-706 when you upgrade Hadoop in general. I believe it would not be too difficult to port the trunk version of the fair scheduler to 0.20 and get all the architectural changes and improvements in 706 with that.

        Matei

        Show
        Matei Zaharia added a comment - Hi Zheng, For issue 1, the provided patch looks good. It might be nice to add a unit test for it though. For issue 2, I believe the implementation of locality waits in MAPREDUCE-706 has solved the issue. In that implementation, once a job has launched a non-local task, it can keep launching non-local tasks right away without further waits. However, if it ever manages to launch a local task again, it needs to wait to start launching non-local tasks. The reasoning for this is that maybe the job had just been unlucky earlier and still has lots of tasks left to launch, and we don't want it to stay stuck at the non-local level. I think the locality wait code you guys are running at Facebook is much older than the one in MAPREDUCE-706 , so it would be nice if you could upgrade to MAPREDUCE-706 when you upgrade Hadoop in general. I believe it would not be too difficult to port the trunk version of the fair scheduler to 0.20 and get all the architectural changes and improvements in 706 with that. Matei

          People

          • Assignee:
            Zheng Shao
            Reporter:
            Zheng Shao
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development