Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2729

Reducers are always counted having "pending tasks" even if they can't be scheduled yet because not enough of their mappers have completed

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.205.0
    • Fix Version/s: 0.20.205.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      0.20.1xx-Secondary

      Description

      In capacity scheduler, number of users in a queue needing slots are calculated based on whether users' jobs have any pending tasks.
      This works fine for map tasks. However, for reduce tasks, jobs do not need reduce slots until the minimum number of map tasks have been completed.

      Here, we add checking whether reduce is ready to schedule (i.e. if a job has completed enough map tasks) when we increment number of users in a queue needing reduce slots.

        Activity

        Hide
        Milind Bhandarkar added a comment -

        It would be good to have a notion of a "ready" task, which is separate from a pending task.

        Show
        Milind Bhandarkar added a comment - It would be good to have a notion of a "ready" task, which is separate from a pending task.
        Hide
        Sherry Chen added a comment -

        To pass TestJobQueueTaskScheduler test cases, patch for MAPREDUCE-2621 has to be applied.

        Show
        Sherry Chen added a comment - To pass TestJobQueueTaskScheduler test cases, patch for MAPREDUCE-2621 has to be applied.
        Hide
        Sherry Chen added a comment -

        Ant test passed.

        Show
        Sherry Chen added a comment - Ant test passed.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12487910/MAPREDUCE-2729.patch
        against trunk revision 1150926.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/504//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12487910/MAPREDUCE-2729.patch against trunk revision 1150926. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/504//console This message is automatically generated.
        Hide
        Arun C Murthy added a comment -

        Sherry, the patch looks good. What sort of testing have you done?

        Show
        Arun C Murthy added a comment - Sherry, the patch looks good. What sort of testing have you done?
        Hide
        Sherry Chen added a comment -

        Arun, I ran unit-tests and test-patch. Thx, Sherry

        Show
        Sherry Chen added a comment - Arun, I ran unit-tests and test-patch. Thx, Sherry
        Hide
        Arun C Murthy added a comment -

        Sherry - I meant what tests you ran at scale to ensure this works...

        Show
        Arun C Murthy added a comment - Sherry - I meant what tests you ran at scale to ensure this works...
        Hide
        Sherry Chen added a comment -

        Arun, do you mean I need to run tests in a test cluster? I haven't got any cluster to do it.

        Show
        Sherry Chen added a comment - Arun, do you mean I need to run tests in a test cluster? I haven't got any cluster to do it.
        Hide
        Arun C Murthy added a comment -

        Sherry, you need to verify this on a real cluster to be safe before we commit this...

        Show
        Arun C Murthy added a comment - Sherry, you need to verify this on a real cluster to be safe before we commit this...
        Hide
        Arun C Murthy added a comment -

        To qualify: please run it on a cluster of 5-10 nodes, verify the fix manually and please let me know. Thanks.

        Show
        Arun C Murthy added a comment - To qualify: please run it on a cluster of 5-10 nodes, verify the fix manually and please let me know. Thanks.
        Hide
        Sherry Chen added a comment -

        Tested in 10 node mini cluster, test passed.

        Show
        Sherry Chen added a comment - Tested in 10 node mini cluster, test passed.
        Hide
        Arun C Murthy added a comment -

        Sherri - thanks. Can you please clarify that you manually verified this fix on the cluster? Thanks.

        Show
        Arun C Murthy added a comment - Sherri - thanks. Can you please clarify that you manually verified this fix on the cluster? Thanks.
        Hide
        Sherry Chen added a comment -

        I manually verified this fix on the 10 nodes cluster.

        Verification steps:
        1. Replace hadoop-capacity-scheduler.jar with the fix on the cluster gateway
        2. Modify the capacity-scheduler.xml to ensure a queue have multiple map & reduce task slots
        3. restart mapred
        4. Submit jobs for a user which start reduces when 5% (default) maps complete, submit jobs for 2nd user (same queue as 1st user) which start reduces when 50% maps complete.
        5. Verify that 1st user got all queue reduce capacity whatever the 2nd user hasn't used yet, it is greater than user-limit.

        Show
        Sherry Chen added a comment - I manually verified this fix on the 10 nodes cluster. Verification steps: 1. Replace hadoop-capacity-scheduler.jar with the fix on the cluster gateway 2. Modify the capacity-scheduler.xml to ensure a queue have multiple map & reduce task slots 3. restart mapred 4. Submit jobs for a user which start reduces when 5% (default) maps complete, submit jobs for 2nd user (same queue as 1st user) which start reduces when 50% maps complete. 5. Verify that 1st user got all queue reduce capacity whatever the 2nd user hasn't used yet, it is greater than user-limit.
        Hide
        Arun C Murthy added a comment -

        Sherry, the patch doesn't apply clean - can you please re-generate it? Thanks.

        Show
        Arun C Murthy added a comment - Sherry, the patch doesn't apply clean - can you please re-generate it? Thanks.
        Hide
        Thomas Graves added a comment -

        The patch is for the branch-0.20-security branch. I will look at putting it on trunk.

        Show
        Thomas Graves added a comment - The patch is for the branch-0.20-security branch. I will look at putting it on trunk.
        Hide
        Arun C Murthy added a comment -

        Thomas, it doesn't make sense to port this to trunk - please don't bother, unless you want to look at this vis-a-vis MAPREDUCE-279.

        Show
        Arun C Murthy added a comment - Thomas, it doesn't make sense to port this to trunk - please don't bother, unless you want to look at this vis-a-vis MAPREDUCE-279 .
        Hide
        Arun C Murthy added a comment -

        Sorry, some weird issue with my patch d/w.

        I just committed this. Thanks Sherry!

        Show
        Arun C Murthy added a comment - Sorry, some weird issue with my patch d/w. I just committed this. Thanks Sherry!
        Hide
        Matt Foley added a comment -

        Closed upon release of 0.20.205.0

        Show
        Matt Foley added a comment - Closed upon release of 0.20.205.0

          People

          • Assignee:
            Sherry Chen
            Reporter:
            Sherry Chen
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development