Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1030

Reduce tasks are getting starved in capacity scheduler

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: 0.21.0
    • Component/s: capacity-sched
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Modified the scheduling logic in capacity scheduler to return a map and a reduce task per heartbeat.

      Description

      reduce tasks are getting starved in capacity scheduler.

      1. MAPREDUCE-1030-1.patch
        3 kB
        rahul k singh
      2. MAPREDUCE-1030-2.patch.txt
        22 kB
        rahul k singh
      3. MAPREDUCE-1030-3.patch
        55 kB
        rahul k singh
      4. MAPREDUCE-1030-4.patch
        61 kB
        rahul k singh
      5. MAPREDUCE-1030-5.patch
        63 kB
        Hemanth Yamijala
      6. MAPREDUCE-1030-6.patch
        96 kB
        rahul k singh
      7. MAPREDUCE-1030-7.patch
        111 kB
        Hemanth Yamijala
      8. MAPREDUCE-1030-version20.patch
        18 kB
        rahul k singh

        Activity

        Hide
        Vinod Kumar Vavilapalli added a comment -

        This is a serious issue. This happens when using capacity-scheduler with TTs in the cluster configured with more number of map slots than reduce slots, for e.g. in 5:2 ratio.

        At times, when there is a steady stream of short jobs with much more number of maps than there are reduces across all jobs, jobs start hanging between map and reduce phases. Reduces are starved and (short) maps keep getting finished quickly and new maps keep getting assigned to TT. Reduces of all the jobs hang for hours together, increasing the total jobs' execution time by even 10X sometimes; this has to be fixed.

        Show
        Vinod Kumar Vavilapalli added a comment - This is a serious issue. This happens when using capacity-scheduler with TTs in the cluster configured with more number of map slots than reduce slots, for e.g. in 5:2 ratio. At times, when there is a steady stream of short jobs with much more number of maps than there are reduces across all jobs, jobs start hanging between map and reduce phases. Reduces are starved and (short) maps keep getting finished quickly and new maps keep getting assigned to TT. Reduces of all the jobs hang for hours together, increasing the total jobs' execution time by even 10X sometimes; this has to be fixed.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        This fix may also be needed to be pushed to 0.20.2 too.

        Show
        Vinod Kumar Vavilapalli added a comment - This fix may also be needed to be pushed to 0.20.2 too.
        Hide
        Arun C Murthy added a comment -

        This happens when using capacity-scheduler with TTs in the cluster configured with more number of map slots than reduce slots, for e.g. in 5:2 ratio.

        This has more to do with the ratio of maps to reduces in the system and less to do with slot ratios. If we had 500k maps (across all jobs) versus a few thousand reduces (across all jobs) this behaviour would be prevalent too.

        Show
        Arun C Murthy added a comment - This happens when using capacity-scheduler with TTs in the cluster configured with more number of map slots than reduce slots, for e.g. in 5:2 ratio. This has more to do with the ratio of maps to reduces in the system and less to do with slot ratios. If we had 500k maps (across all jobs) versus a few thousand reduces (across all jobs) this behaviour would be prevalent too.
        Hide
        rahul k singh added a comment -

        Attachng the fix , Testcases are broken , need to fixed . Would do that and upload the new patch.

        Show
        rahul k singh added a comment - Attachng the fix , Testcases are broken , need to fixed . Would do that and upload the new patch.
        Hide
        rahul k singh added a comment -

        Attaching the patch for 20 branch.

        Show
        rahul k singh added a comment - Attaching the patch for 20 branch.
        Hide
        rahul k singh added a comment -

        MAPREDUCE-1030-2.patch.txt is for 20 branch. Sorry for not making patch name intuitive enough

        Show
        rahul k singh added a comment - MAPREDUCE-1030 -2.patch.txt is for 20 branch. Sorry for not making patch name intuitive enough
        Hide
        Arun C Murthy added a comment -

        Nit: With this change CapacityTaskScheduler.assignTasks returns an empty List when assignMultipleTasks is true and cannot find any task (map or reduce) to schedule... however, it returns null if assignMultipleTasks is false and cannot find a map or reduce to schedule.

        Show
        Arun C Murthy added a comment - Nit: With this change CapacityTaskScheduler.assignTasks returns an empty List when assignMultipleTasks is true and cannot find any task (map or reduce) to schedule... however, it returns null if assignMultipleTasks is false and cannot find a map or reduce to schedule.
        Hide
        rahul k singh added a comment -

        The patch also changes the existing testcase according to the new changes in capacity scheduler

        Show
        rahul k singh added a comment - The patch also changes the existing testcase according to the new changes in capacity scheduler
        Hide
        rahul k singh added a comment -

        Added some more testcase to test multiple assignement

        Show
        rahul k singh added a comment - Added some more testcase to test multiple assignement
        Hide
        Hemanth Yamijala added a comment -

        This patch updates documentation in CapacityTaskScheduler and TestCapacityScheduler to be in sync with the new behavior of assigning a map and reduce with each heartbeat.

        I did these changes along with the code review, as I felt that it would be easier to make these changes rather than provide them as feedback, which would have made the feedback unnecessarily verbose.

        I do have other review comments that are not related to synching up the documentation that I will raise in a separate comment.

        Show
        Hemanth Yamijala added a comment - This patch updates documentation in CapacityTaskScheduler and TestCapacityScheduler to be in sync with the new behavior of assigning a map and reduce with each heartbeat. I did these changes along with the code review, as I felt that it would be easier to make these changes rather than provide them as feedback, which would have made the feedback unnecessarily verbose. I do have other review comments that are not related to synching up the documentation that I will raise in a separate comment.
        Hide
        Hemanth Yamijala added a comment -

        Some comments, all in test cases:

        • The intent of the test case, testMaxReduceCap, is slightly changed. Previously, we wanted to verify that after hitting the max reduce limit, finishing a map task has no effect. This check is missing.
        • testCapacityTransfer - can check that both maps and reduces are being assigned.
        • testHighMemoryBlockingWithMaxLimit can verify that all tasks are assigned as expected, and not just check for maps or reduces.
        • For the testcase, testUserLimitsForHighMemoryJobs, I think we can make it more readable if we put it in a loop, as it is repeating the same thing. Something like:
          for (int i=0; i<5; i++) {
            expectedStrings.clear();
            expectedStrings.put(
                CapacityTestUtils.MAP, "attempt_test_0001_m_00000"+(i+1)+"_0 on tt1");
            expectedStrings.put(
                CapacityTestUtils.REDUCE, "attempt_test_0001_r_00000"+(i+1)+"_0 on tt1");
            checkAssignmentTasks(taskTrackerManager, scheduler, "tt1",
                expectedStrings);
          }
          

          This will also avoid mistakes like for instance in one of the scenarios here we should be using checkAssignmentTasks but instead we are using checkAssignment

        • In testSchedulingInformation, check for reduce stats being updated after job3 is scheduled.
        • testHighMemoryBlockingAcrossTaskTypes - after the first call to assignTasks, a map and reduce are both scheduled. We should have another call that will now assert that a reduce of the second job is scheduled.
        • testSpeculativeTaskScheduling: The assertion for fjob1.pendingReduces() to be 0 can be moved after the initial call to assignTasks itself with multiple task assignment. Also, we are not checking assignment of speculative reduces. After speculative tasks are handled, we should check that the normal job's reduces are also scheduled.
        • testMultiTaskAssignmentInSingleQueue() is no longer needed, as the same is being tested in many other test cases.
        • testMultiTaskAssignmentInMultipleQueues() needs a javadoc for explaining the test case, and should reuse the new utility APIs like checkAssignmentTasks().
        • checkRunningJobMovementAndCompletion can check multiple assignment.
        • checkAssignment - indentation seems wrong.
        • Since checkAssignment cannot be called if no tasks are returned, the check for tasks == null or empty seems unnecessary. Also, we can simply return the 0th element in the list of tasks.
        • We can rename checkAssignmentTasks to checkMultipleTaskAssignment() to better document intent. This also needs a javadoc. Also, indentation within the method needs some correction. Also, I think we should also check that the number of tasks assigned matches the number expected.
        • TestContainerQueue and TestRefreshOfQueues test classes can be easily modified for checking multiple assignment because of the utility APIs we have. This way, we can actually check reduce assignment for hierarchical queues is working as expected as well.
        Show
        Hemanth Yamijala added a comment - Some comments, all in test cases: The intent of the test case, testMaxReduceCap, is slightly changed. Previously, we wanted to verify that after hitting the max reduce limit, finishing a map task has no effect. This check is missing. testCapacityTransfer - can check that both maps and reduces are being assigned. testHighMemoryBlockingWithMaxLimit can verify that all tasks are assigned as expected, and not just check for maps or reduces. For the testcase, testUserLimitsForHighMemoryJobs, I think we can make it more readable if we put it in a loop, as it is repeating the same thing. Something like: for ( int i=0; i<5; i++) { expectedStrings.clear(); expectedStrings.put( CapacityTestUtils.MAP, "attempt_test_0001_m_00000" +(i+1)+ "_0 on tt1" ); expectedStrings.put( CapacityTestUtils.REDUCE, "attempt_test_0001_r_00000" +(i+1)+ "_0 on tt1" ); checkAssignmentTasks(taskTrackerManager, scheduler, "tt1" , expectedStrings); } This will also avoid mistakes like for instance in one of the scenarios here we should be using checkAssignmentTasks but instead we are using checkAssignment In testSchedulingInformation, check for reduce stats being updated after job3 is scheduled. testHighMemoryBlockingAcrossTaskTypes - after the first call to assignTasks, a map and reduce are both scheduled. We should have another call that will now assert that a reduce of the second job is scheduled. testSpeculativeTaskScheduling: The assertion for fjob1.pendingReduces() to be 0 can be moved after the initial call to assignTasks itself with multiple task assignment. Also, we are not checking assignment of speculative reduces. After speculative tasks are handled, we should check that the normal job's reduces are also scheduled. testMultiTaskAssignmentInSingleQueue() is no longer needed, as the same is being tested in many other test cases. testMultiTaskAssignmentInMultipleQueues() needs a javadoc for explaining the test case, and should reuse the new utility APIs like checkAssignmentTasks(). checkRunningJobMovementAndCompletion can check multiple assignment. checkAssignment - indentation seems wrong. Since checkAssignment cannot be called if no tasks are returned, the check for tasks == null or empty seems unnecessary. Also, we can simply return the 0th element in the list of tasks. We can rename checkAssignmentTasks to checkMultipleTaskAssignment() to better document intent. This also needs a javadoc. Also, indentation within the method needs some correction. Also, I think we should also check that the number of tasks assigned matches the number expected. TestContainerQueue and TestRefreshOfQueues test classes can be easily modified for checking multiple assignment because of the utility APIs we have. This way, we can actually check reduce assignment for hierarchical queues is working as expected as well.
        Hide
        rahul k singh added a comment -
        • Also, I think we should also check that the number of tasks assigned matches the number expected.
          Checking for size would result in breaking other testcase , as they check for map only or reduce only , even if assignTasks returns both of them.
          In this patch we are checking to make sure that what ever attempt user is expecting , if that attempt is there in the tasks list.
        Show
        rahul k singh added a comment - Also, I think we should also check that the number of tasks assigned matches the number expected. Checking for size would result in breaking other testcase , as they check for map only or reduce only , even if assignTasks returns both of them. In this patch we are checking to make sure that what ever attempt user is expecting , if that attempt is there in the tasks list.
        Hide
        rahul k singh added a comment -

        Attached the new patch with hemanth's comments.

        Show
        rahul k singh added a comment - Attached the new patch with hemanth's comments.
        Hide
        Hemanth Yamijala added a comment -

        I made a few changes as I was reviewing the last patch by Rahul, and am attaching a new patch with those.

        The changes are mainly in the area of making checkMultipleTaskAssignment to be more strict in checking the tasks expected and returned. In particular, I am now failing the test if a task is scheduled when it should not be. This necessitated that all tasks returned by the scheduler are checked, even for the overloaded method checkAssignment, and correspondingly tests in TestCapacityScheduler needed to be changed so that it uses multiple task assignment checks where appropriate. I remember changing all the user limit tests and testClusterBlockingForLackOfMemory to accomodate this.

        Other than this, I verified other changes are fine. So, I am pushing this patch through Hudson.

        Show
        Hemanth Yamijala added a comment - I made a few changes as I was reviewing the last patch by Rahul, and am attaching a new patch with those. The changes are mainly in the area of making checkMultipleTaskAssignment to be more strict in checking the tasks expected and returned. In particular, I am now failing the test if a task is scheduled when it should not be. This necessitated that all tasks returned by the scheduler are checked, even for the overloaded method checkAssignment, and correspondingly tests in TestCapacityScheduler needed to be changed so that it uses multiple task assignment checks where appropriate. I remember changing all the user limit tests and testClusterBlockingForLackOfMemory to accomodate this. Other than this, I verified other changes are fine. So, I am pushing this patch through Hudson.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12421826/MAPREDUCE-1030-7.patch
        against trunk revision 823227.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 13 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/157/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/157/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/157/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/157/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12421826/MAPREDUCE-1030-7.patch against trunk revision 823227. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 13 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/157/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/157/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/157/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/157/console This message is automatically generated.
        Hide
        Hemanth Yamijala added a comment -

        The failed test is being tracked in MAPREDUCE-1029 and is not related to this patch. This is good for commit.

        Show
        Hemanth Yamijala added a comment - The failed test is being tracked in MAPREDUCE-1029 and is not related to this patch. This is good for commit.
        Hide
        Hemanth Yamijala added a comment -

        I just committed this to trunk and branch 0.21. Thanks, Rahul !

        Show
        Hemanth Yamijala added a comment - I just committed this to trunk and branch 0.21. Thanks, Rahul !
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #73 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/73/)
        . Modified scheduling algorithm to return a map and reduce task per heartbeat in the capacity scheduler. Contributed by Rahul Kumar Singh.

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #73 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/73/ ) . Modified scheduling algorithm to return a map and reduce task per heartbeat in the capacity scheduler. Contributed by Rahul Kumar Singh.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #112 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/112/)
        . Modified scheduling algorithm to return a map and reduce task per heartbeat in the capacity scheduler. Contributed by Rahul Kumar Singh.

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #112 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/112/ ) . Modified scheduling algorithm to return a map and reduce task per heartbeat in the capacity scheduler. Contributed by Rahul Kumar Singh.

          People

          • Assignee:
            rahul k singh
            Reporter:
            rahul k singh
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development