Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-517

The capacity-scheduler should assign multiple tasks per heartbeat

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.20.204.0, 0.23.0
    • Component/s: None
    • Labels:
      None

      Description

      HADOOP-3136 changed the default o.a.h.mapred.JobQueueTaskScheduler to assign multiple tasks per TaskTracker heartbeat, the capacity-scheduler should do the same.

      1. MAPREDUCE-517_yhadoop20.patch
        40 kB
        Arun C Murthy
      2. MAPREDUCE-517_yhadoop20.patch
        40 kB
        Arun C Murthy
      3. MAPREDUCE-517_yhadoop20.patch
        100 kB
        Arun C Murthy
      4. MAPREDUCE-517_yhaddop20.patch
        22 kB
        Arun C Murthy
      5. HADOOP-5090-20090604.txt
        46 kB
        Vinod Kumar Vavilapalli
      6. HADOOP-5090-20090506.txt
        39 kB
        Vinod Kumar Vavilapalli
      7. HADOOP-5090-20090504.txt
        37 kB
        Vinod Kumar Vavilapalli

        Issue Links

          Activity

          Hide
          Vinod Kumar Vavilapalli added a comment -

          Attaching patch. With this patch, CapacityTaskScheduler assigns multiple tasks in a single heartbeat. It assigns multiple maps just like the default scheduler - multiple local tasks and at-most one off-switch tasks; and multiple reduces. It also keeps track of the tasks decided to be assigned in a particular scheduling iteration so as that high memory jobs are blocked for scheduling and user-limits are respected while giving away multiple tasks.

          The patch also has test-cases. Benchmarking is in progress.

          Show
          Vinod Kumar Vavilapalli added a comment - Attaching patch. With this patch, CapacityTaskScheduler assigns multiple tasks in a single heartbeat. It assigns multiple maps just like the default scheduler - multiple local tasks and at-most one off-switch tasks; and multiple reduces. It also keeps track of the tasks decided to be assigned in a particular scheduling iteration so as that high memory jobs are blocked for scheduling and user-limits are respected while giving away multiple tasks. The patch also has test-cases. Benchmarking is in progress.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12407144/HADOOP-5090-20090504.txt
          against trunk revision 771505.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/288/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/288/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/288/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/288/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12407144/HADOOP-5090-20090504.txt against trunk revision 771505. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/288/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/288/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/288/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/288/console This message is automatically generated.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Updated patch.

          Show
          Vinod Kumar Vavilapalli added a comment - Updated patch.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12407314/HADOOP-5090-20090506.txt
          against trunk revision 772960.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/304/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12407314/HADOOP-5090-20090506.txt against trunk revision 772960. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/304/console This message is automatically generated.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Updated patch. This has to be applied over the latest patch for HADOOP-5884.

          Show
          Vinod Kumar Vavilapalli added a comment - Updated patch. This has to be applied over the latest patch for HADOOP-5884 .
          Hide
          Arun C Murthy added a comment -

          I'd strongly urge against assigning multiple reduces per task. When I did it HADOOP-3136 it caused bad imbalances with reduces... for e.g. consider 2 jobs - one with 'small' reduces and other with 'heavy' reduces. If we assign multiple reduces then a portion of the cluster (tasktrackers) will run the 'small' reduces and the others will run 'heavy' reduces leading to bad imbalances in load on the machine. Given that we decided to assign only 1 reduce per heartbeat wiht HADOOP-3136 to achieve better load balance.

          Show
          Arun C Murthy added a comment - I'd strongly urge against assigning multiple reduces per task. When I did it HADOOP-3136 it caused bad imbalances with reduces... for e.g. consider 2 jobs - one with 'small' reduces and other with 'heavy' reduces. If we assign multiple reduces then a portion of the cluster (tasktrackers) will run the 'small' reduces and the others will run 'heavy' reduces leading to bad imbalances in load on the machine. Given that we decided to assign only 1 reduce per heartbeat wiht HADOOP-3136 to achieve better load balance.
          Hide
          Arun C Murthy added a comment -

          Also, I really don't think having a config to control assignment of one or many tasks is a good idea. We should just stick with multiple assignments.

          Show
          Arun C Murthy added a comment - Also, I really don't think having a config to control assignment of one or many tasks is a good idea. We should just stick with multiple assignments.
          Hide
          Arun C Murthy added a comment -

          Updated patch for y20, incorporates MAPREDUCE-538 also.

          Show
          Arun C Murthy added a comment - Updated patch for y20, incorporates MAPREDUCE-538 also.
          Hide
          Arun C Murthy added a comment -

          Updated patch for y20.

          Highlights:

          1. CS assigns multiple tasks per heartbeat, at most 1 off-switch task per heartbeat.
          2. I've incorporated HADOOP-538 also to ensure jobs at the head of the queue do not aggressively grab tasks hurting locality for others.
            1. The implementation tracks 'number of scheduling opportunities' missed by a job and gets jobs to use that to prevent starvation.
            2. I've also added 'pace' to the back-off by getting ensuring jobs do not back-off as aggressively as they make progress.
            3. The patch also gets small jobs to back off less vis-a-vis larger jobs by ensuring the backoff considers the #maps in the jobs.
            4. The patch also ensures jobs with no locality e.g. sleep-job/randomwriter do not care about backoff since it doesn't make sense at all.
          Show
          Arun C Murthy added a comment - Updated patch for y20. Highlights: CS assigns multiple tasks per heartbeat, at most 1 off-switch task per heartbeat. I've incorporated HADOOP-538 also to ensure jobs at the head of the queue do not aggressively grab tasks hurting locality for others. The implementation tracks 'number of scheduling opportunities' missed by a job and gets jobs to use that to prevent starvation. I've also added 'pace' to the back-off by getting ensuring jobs do not back-off as aggressively as they make progress. The patch also gets small jobs to back off less vis-a-vis larger jobs by ensuring the backoff considers the #maps in the jobs. The patch also ensures jobs with no locality e.g. sleep-job/randomwriter do not care about backoff since it doesn't make sense at all.
          Hide
          Arun C Murthy added a comment -

          Minor tweak to ensure off-switch tasks are scheduled more aggressively for high-ram maps.

          Show
          Arun C Murthy added a comment - Minor tweak to ensure off-switch tasks are scheduled more aggressively for high-ram maps.
          Hide
          Arun C Murthy added a comment -

          Updated patch to fix TestCapacityScheduler.

          Show
          Arun C Murthy added a comment - Updated patch to fix TestCapacityScheduler.
          Hide
          Harsh J added a comment -

          Thanks for marking Luke!

          Show
          Harsh J added a comment - Thanks for marking Luke!
          Hide
          Owen O'Malley added a comment -

          Hadoop 0.20.204.0 was just released.

          Show
          Owen O'Malley added a comment - Hadoop 0.20.204.0 was just released.

            People

            • Assignee:
              Arun C Murthy
              Reporter:
              Arun C Murthy
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development