Issue Details (XML | Word | Printable)

Key: HADOOP-4523
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Vinod K V
Reporter: Vivek Ratan
Votes: 0
Watchers: 4
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Enhance how memory-intensive user tasks are handled

Created: 27/Oct/08 06:47 AM   Updated: 08/Jul/09 04:53 PM
Return to search
Component/s: None
Affects Version/s: 0.19.0
Fix Version/s: 0.20.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works HADOOP-4523-200811-05.txt 2008-11-05 09:38 AM Vinod K V 12 kB
Text File Licensed for inclusion in ASF works HADOOP-4523-200811-06.txt 2008-11-06 10:45 AM Vinod K V 17 kB
Text File Licensed for inclusion in ASF works HADOOP-4523-20081110.txt 2008-11-10 10:46 AM Vinod K V 19 kB
Text File Licensed for inclusion in ASF works HADOOP-4523-20081113.txt 2008-11-13 09:58 AM Vinod K V 22 kB
Text File Licensed for inclusion in ASF works HADOOP-4523-20081118.txt 2008-11-18 09:09 AM Vinod K V 22 kB
Issue Links:
Dependants
 

Hadoop Flags: Reviewed
Resolution Date: 19/Nov/08 06:23 AM


 Description  « Hide
HADOOP-3581 monitors each Hadoop task to see if its memory usage (which includes usage of any tasks spawned by it and so on) is within a per-task limit. If the task's memory usage goes over its limit, the task is killed. This, by itself, is not enough to prevent badly behaving jobs from bringing down nodes. What is also needed is the ability to make sure that the sum total of VM usage of all Hadoop tasks does not exceed a certain limit.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Vivek Ratan added a comment - 27/Oct/08 07:56 AM - edited
HADOOP-3759 provides a configuration value, mapred.tasktracker.tasks.maxmemory, which specifies the total VM on a machine available to tasks spawned by the TT. Along with HADOOP-4439, it provides a cluster-wide default for the maximum VM associated per task, mapred.task.default.maxmemory. This value can be overridden by individual jobs. HADOOP-3581 implements a monitoring mechanism that kill tasks if they go over their maxmemory value. Keeping all this in mind, here's a proposal for what we need to additionally do:

If tasks.maxmemory is set, the TT monitors the total memory usage of all tasks spawned by the TT. If this value goes over tasks.maxmemory, the TT needs to kill one or more tasks. It first looks for tasks whose individual memory is over their default.maxmemory value. These are killed (while you may ideally want to kill just enough that your total memory usage comes down, it's not obvious which of these violators you choose to kill, so it's probably simpler to kill all). If no such task is found, or if killing one or more of these tasks still takes us over the memory limit, we need to pick other tasks to kill. There are many ways to do this. Probably the easiest is to kill tasks that ran most recently.

Tasks that are killed because they went over their memory limit should be treated as failed, since they violated their contract. Tasks that are killed because the sum total of memory usage was over a limit should be treated as killed, since it's not really their fault.

Another improvement is to let mapred.tasktracker.tasks.maxmemory be set by an external script, which lets Ops control what this value should be. A slightly less desirable option, as indicated in some offline discussions with Allen W, is to set this value to be an absolute number ("hadoop may use X amount") or an offset of the total amount of memory on the machine ("hadoop may use all but 4g").


Owen O'Malley added a comment - 30/Oct/08 08:22 PM
This jira isn't very clear. What are you proposing changing? Is it to make the mapred.tasktracker.tasks.maxmemory pluggable? If so, I'd propose making an interface like:
abstract class MemoryPlugin {
  long getVirtualMemorySize(Configuration conf);
}

and you configure an implementation of it. (mapred.server.memory.plugin ?)


Vivek Ratan added a comment - 31/Oct/08 06:07 AM
I'm proposing a couple of improvements:
  1. The TT currently monitors each task (and its descendants) to see if that task's memory usage goes over a per-task limit. The TT should additionally monitor to make sure that the sum of memory used by all tasks should not go over a per-node limit (tasks.maxmemory). This situation is unlikely t happen if schedulers consider memory judiciously when scheduling, but not all schedulers may. In addition, the TT should pick tasks that ran last, when deciding what tasks to kill.
  2. I'm also proposing a way to specify tasks.maxmemory. This particular discussion is, however, going on in HADOOP-4035, so we can continue discussion there.

Vinod K V added a comment - 05/Nov/08 09:38 AM
Attaching a patch. This
  • makes TaskMemoryManagerThread to observe total memory usage across all tasks. If total usage crosses overall limit, TT tries and kills any tasks which cross individual task limits. If it cannot find such tasks, it kills the task with the least progress found via TaskTracker.findTaskToKill() which has already been used in case of overflowing disk. This method first tries to find the reduce task with least progress, otherwise it returns the map task with least progress.
  • marks tasks killed because of transgressing individual limits as failed, otherwise they are marked as killed.
  • includes testTasksWithinTTLimits, testTaskBeyondIndividualLimitsAndTotalUsageBeyondTTLimits and testTaskBeyondIndividualLimitsButTotalUsageWithinTTLimits. Couldn't write a test to check killing of a task with least progress; simulating this situation proved very difficult.

Vinod K V added a comment - 06/Nov/08 10:45 AM
Messed up the approach. Here's another patch that gets it right. It
  • monitors and kills any tasks that cross the individual tasks' limits they have.
  • kills the task with least progress(via tasktracker.findTaskToKill()) if even after the first step the total memory usage across all tasks goes over the total usage allowed.
  • includes the tests testTasksWithNoIndividualLimitsButTotalUsageWithinTTLimits, testTasksWithinIndividualLimitsAndTotalUsageWithinTTLimits, testTasksBeyondIndividualLimitsAndTotalUsageWithinTTLimits and testTasksWithinIndividualLimitsButTotalUsageBeyondTTLimits.

Hemanth Yamijala added a comment - 10/Nov/08 08:30 AM
The latest patch kills only the last task that started if the sum total of all tasks' memory usage goes beyond the configured limit. Picking up only one task may or may not bring down the usage to within the configured limits. Should we really be picking up enough tasks to kill ?

Vivek Ratan added a comment - 10/Nov/08 08:37 AM

Should we really be picking up enough tasks to kill ?

Yes. Kill one or more so you go below the limit, as mentioned in the summary in HADOOP-4035.


Vinod K V added a comment - 10/Nov/08 10:46 AM

The latest patch kills only the last task that started if the sum total of all tasks' memory usage goes beyond the configured limit. Picking up only one task may or may not bring down the usage to within the configured limits.

Attaching a new patch to address this. TaskMemoryManagerThread now calls TaskTracker.findTaskToKill() repeatedly to find a few tasks with the least progress so as to bring down the total memory usage of all tasks falls below TT's limit, and then kills them. Modified the signature of TaskTracker.findTaskToKill() to TaskTracker.findTaskToKill(List<TaskAttempId> tasksToExclude) so as to help excluding tasks that are already marked for killing.


Hadoop QA added a comment - 10/Nov/08 07:39 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12393629/HADOOP-4523-20081110.txt
against trunk revision 712615.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

-1 javadoc. The javadoc tool appears to have generated 1 warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 Eclipse classpath. The patch retains Eclipse classpath integrity.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3569/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3569/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3569/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3569/console

This message is automatically generated.


Hemanth Yamijala added a comment - 12/Nov/08 10:35 AM
  • Consider a case where a task just started, and so is sitting in the tasksToBeAdded list, but doesn't have a ProcessTreeInfo. If by some chance, this task is the first reduce task with the least progress, it would be returned in the findTaskToKill method. However, if it is not found in the processTreeInfoMap, it is not added to the tasksToKill list. And also the memory is not reduced. Hence subsequent calls to findTaskToKill will keep repeating this task, and the code would be stuck in a loop.
  • Code related to killing a task is repeated when killing tasks that were over limit, and those that need to be killed because the cumulative limit is still in excess. Can we refactor this into a common code ?
  • Can we improve the diagnostic message being logged when we kill the task. Something like: "Killing task '<tid>' as the cumulative memory usage of tasks exceeds virtual memory limit '<limit>' on the task tracker, as task has the least progress."
  • Add javadoc on param tasksToExclude for findTaskToKill. Can mention that passing null will include all tasks.
  • I think we need the following tests:
    • When memory management is disabled, and tasks running over limits, but nothing is killed. (backwards compatibility)
    • When there are tasks that are individually over limit and also cumulatively over limit (where some tasks haven't specified memory limits)
  • To simulate a task with least progress, can we have some tasks which have very large sleep limit and some with very small, or something like that.
  • Suggest a few shorter names for the tests. Essentially we test with jobs which exceed limits individually, cumulatively, and a mix of both. So something like
    • testJobWithinLimits
    • testJobExceedingLimits
    • testJobsCumulativelyExceedingLimits
    • testMixedSetOfJobsExceedingLimits
  • Few references to WordCount in the comments. Are they valid ?
  • The testTasksWithinIndividualLimitsButTotalUsageBeyondTTLimits (or testJobsCumulativelyExceedingLimits) does not seem deterministic. Indeed this test case failed on my machine. How can we be sure that atleast one overflows. One way could be to a have a TT with 2 maps and 2 reduce slots. Submit a job with 2 map tasks and 2 reduces. Let the tasks ask for high memory so that sum of 2 tasks exceeds the TT limit, and the TT have very low memory limit. Then we can get the task reports and verify that a couple of tasks were killed.

Vinod K V added a comment - 13/Nov/08 09:58 AM
Attaching another patch with the above review comments.

Didn't write a separate testMixedSetExceedingLimits - it seemed to me that it's not adding any value, for it is already being indirectly incorporated in the other two independent tests that verify the tasks' limits and the TT limits.


Hadoop QA added a comment - 14/Nov/08 07:47 AM
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12393859/HADOOP-4523-20081113.txt
against trunk revision 713893.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 Eclipse classpath. The patch retains Eclipse classpath integrity.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3590/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3590/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3590/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3590/console

This message is automatically generated.


Hemanth Yamijala added a comment - 17/Nov/08 04:26 PM
Code looks good to me. +1

Hemanth Yamijala added a comment - 18/Nov/08 06:27 AM
Vinod, while going over this patch with Devaraj for a quick check, we thought it will be nice to split up the run method in the TaskMemoryManagerThread into a couple of smaller methods, just to ease readability. The rest of the changes are still fine. Can you please submit a new patch with this minor change ?

Vinod K V added a comment - 18/Nov/08 09:09 AM
New patch refactoring the newly added code to a killTasksWithLeastProgress method.

Vinod K V added a comment - 18/Nov/08 09:24 AM

`ant test-patch` results:

[exec] +1 overall.  

     [exec]     +1 @author.  The patch does not contain any @author tags.

     [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.

     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.

     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.

Hemanth Yamijala added a comment - 18/Nov/08 10:56 AM
Going to run it through hudson once more.

Vinod K V added a comment - 18/Nov/08 01:43 PM
Hudson's patch queue is very long. I ran `ant test` on my machine. It built successfully.

Hemanth Yamijala added a comment - 19/Nov/08 06:23 AM
I just committed this. Thanks, Vinod !

Hudson added a comment - 19/Nov/08 10:17 PM
Integrated in Hadoop-trunk #665 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/665/)
. Prevent too many tasks scheduled on a node from bringing it down by monitoring for cumulative memory usage across tasks. Contributed by Vinod Kumar Vavilapalli

Hadoop QA added a comment - 21/Nov/08 12:50 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12394152/HADOOP-4523-20081118.txt
against trunk revision 719431.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

-1 patch. The patch command could not apply the patch.

Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3618/console

This message is automatically generated.