Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-722

More slots are getting reserved for HiRAM job tasks then required

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: capacity-sched
    • Labels:
      None
    • Environment:

      Cluster MR capacity=248/248 Map slot size=1500 mb and reduce slot size=2048 mb. Total number of nodes=124
      4 queues each having Capacity=25% User Limit=100%.

    • Hadoop Flags:
      Reviewed

      Description

      Submitted a normal job with map=124=reduces
      After submitted High RAM with maps=31=reduces map.memory=1800 reduce.memory=2800
      Again 3 job maps=124=reduces
      total of 248 slots were reserved for both maps and reduces for High Job which much higher then required.
      Is observed in Hadoop 0.20.0

      1. MAPREDUCE-722.1.txt
        13 kB
        Vinod Kumar Vavilapalli
      2. MAPREDUCE-722.txt
        10 kB
        Vinod Kumar Vavilapalli

        Activity

        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #20 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/20/)

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #20 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/20/ )
        Hide
        Hemanth Yamijala added a comment -

        I just committed this. Thanks, Vinod !

        Show
        Hemanth Yamijala added a comment - I just committed this. Thanks, Vinod !
        Hide
        Arun C Murthy added a comment -

        +1 for the patch and not relying on CapacityScheduler.TaskSchedulingMgr.hasSpeculativeTask which needs to be fixed anyway (MAPREDUCE-725).

        Show
        Arun C Murthy added a comment - +1 for the patch and not relying on CapacityScheduler.TaskSchedulingMgr.hasSpeculativeTask which needs to be fixed anyway ( MAPREDUCE-725 ).
        Hide
        Hemanth Yamijala added a comment -

        Results of test patch

             [exec] -1 overall.
             [exec]
             [exec]     +1 @author.  The patch does not contain any @author tags.
             [exec]
             [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
             [exec]
             [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
             [exec]
             [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
             [exec]
             [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
             [exec]
             [exec]     -1 Eclipse classpath. The patch causes the Eclipse classpath to differ from the contents of the lib directories.
             [exec]
             [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
        

        The eclipse classpath problem can be ignored, as it does not play well with IVY.

        The patch only touches capacity scheduler, whose tests pass locally.

        Show
        Hemanth Yamijala added a comment - Results of test patch [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] -1 Eclipse classpath. The patch causes the Eclipse classpath to differ from the contents of the lib directories. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. The eclipse classpath problem can be ignored, as it does not play well with IVY. The patch only touches capacity scheduler, whose tests pass locally.
        Hide
        Hemanth Yamijala added a comment -

        Code changes look good to me. +1.

        Show
        Hemanth Yamijala added a comment - Code changes look good to me. +1.
        Hide
        Hemanth Yamijala added a comment -

        I am not entirely sure this is a bad decision at all. I think we can assume that at the point the high RAM job decides there are speculative tasks to execute, there will be certain good tasks that will not need speculation, that are still running and will come back to the job tracker. At that point, we can certainly run the speculative tasks of high RAM jobs, and there will be no starvation. I certainly don't believe there is going to be guaranteed starvation in any case.

        Also, our speculation heuristics at this point are only now improving (with HADOOP-2141) and there is a good chance users of older versions of Hadoop (0.20 and before) will not rely on speculation to work that nicely anyway.

        Also, in the extreme case that this condition of starvation is indeed hit, there is the workaround that the user could kill slow running tasks if he thinks the job is taking too long to finish. This is not the greatest workaround available, but will work in the short - medium term.

        So, I would definitely recommend we favor cluster utilization as I suppose is done in the last patch.

        Show
        Hemanth Yamijala added a comment - I am not entirely sure this is a bad decision at all. I think we can assume that at the point the high RAM job decides there are speculative tasks to execute, there will be certain good tasks that will not need speculation, that are still running and will come back to the job tracker. At that point, we can certainly run the speculative tasks of high RAM jobs, and there will be no starvation. I certainly don't believe there is going to be guaranteed starvation in any case. Also, our speculation heuristics at this point are only now improving (with HADOOP-2141 ) and there is a good chance users of older versions of Hadoop (0.20 and before) will not rely on speculation to work that nicely anyway. Also, in the extreme case that this condition of starvation is indeed hit, there is the workaround that the user could kill slow running tasks if he thinks the job is taking too long to finish. This is not the greatest workaround available, but will work in the short - medium term. So, I would definitely recommend we favor cluster utilization as I suppose is done in the last patch.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Thought about it a bit more and discussed with Hemanth and Devaraj. This does look like a complicated issue and needs a larger discussion.

        For fixing the issue at hand, I am reverting changes w.r.t speculative execution. So, the attached patch fixed cluster drain problem in general for high memory jobs. But if a high memory job enables speculative execution, it might starve. Will track the fix for speculative execution in a follow up JIRA.

        Show
        Vinod Kumar Vavilapalli added a comment - Thought about it a bit more and discussed with Hemanth and Devaraj. This does look like a complicated issue and needs a larger discussion. For fixing the issue at hand, I am reverting changes w.r.t speculative execution. So, the attached patch fixed cluster drain problem in general for high memory jobs. But if a high memory job enables speculative execution, it might starve. Will track the fix for speculative execution in a follow up JIRA.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Hemanth found out the reason for this - a faulty conditional in getTaskFromQueue (CapacityTaskScheduler.java +538) :

          if (memory requirement match for this job on this TT) {
              Go ahead and give a task
          } else {
               if (getPendingTasks(j) != 0 || hasSpeculativeTask(j, taskTrackerStatus) || 
                    !hasSufficientReservedTaskTrackers(j)) {
                     Reserve this TaskTracker.
               }
          }
        

        Even when enough reservations are already made, because all the conditions are OR'ed instead of AND'ed, reservations continue to be made till all nodes in the cluster get reserved for the job.

        I am attaching a patch for this. Changing the conditional to be:

                 if ((getPendingTasks(j) != 0 && !hasSufficientReservedTaskTrackers(j))
                                    || hasSpeculativeTask(j, taskTrackerStatus) {
                       Reserve the taskTracker.
                 }
        

        Added a new test case that fails without the code changes and succeeds with. Also fixed two other tests that were buggy and so didn't catch the problems found in this issue.

        This patch is still incomplete w.r.t speculative tasks. We need more thought regarding this, as TaskTrackers reserved for one speculative task T-1 may not be usable by another task T-2 of the same job.

        Show
        Vinod Kumar Vavilapalli added a comment - Hemanth found out the reason for this - a faulty conditional in getTaskFromQueue (CapacityTaskScheduler.java +538) : if (memory requirement match for this job on this TT) { Go ahead and give a task } else { if (getPendingTasks(j) != 0 || hasSpeculativeTask(j, taskTrackerStatus) || !hasSufficientReservedTaskTrackers(j)) { Reserve this TaskTracker. } } Even when enough reservations are already made, because all the conditions are OR'ed instead of AND'ed, reservations continue to be made till all nodes in the cluster get reserved for the job. I am attaching a patch for this. Changing the conditional to be: if ((getPendingTasks(j) != 0 && !hasSufficientReservedTaskTrackers(j)) || hasSpeculativeTask(j, taskTrackerStatus) { Reserve the taskTracker. } Added a new test case that fails without the code changes and succeeds with. Also fixed two other tests that were buggy and so didn't catch the problems found in this issue. This patch is still incomplete w.r.t speculative tasks. We need more thought regarding this, as TaskTrackers reserved for one speculative task T-1 may not be usable by another task T-2 of the same job.

          People

          • Assignee:
            Vinod Kumar Vavilapalli
            Reporter:
            Karam Singh
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development