|
[
Permlink
| « Hide
]
Hemanth Yamijala added a comment - 17/Jun/08 12:12 PM
This is seen for Streaming applications. For Java based applications, we already have a way of limiting the memory using Hadoop configuration.
mapred.child.ulimit?
Arun, from the discussion on
As discussed on
Comments? What prevents the task tracker from overflowing memory itself? Considering the memory leaks we've already seen in the name node, I don't trust the task trackers to be leak free either.
One of the advantages that we have with HOD presently is that because the limit is set prior to the task tracker getting launched is that the task tracker itself is bounded. This makes the entire hadoop task chain limited and not just individual portions. Whatever system is designed needs to mimic this functionality.
Don't we limit task tracker's usage by setting java system property -Xmx? I believe that if there are TT memory leaks, they should be addressed directly in code. The above proposed method, irrespective of the implementation details, prevents tasks from disrupting hadoop daemons, and this is our main objective here. Attaching a first patch, so that it helps in moving the discussion forward. The patch is still raw and needs a good deal of work up. Much of it is just a proof of concept; enough abstraction is set in place so that actual implementation can be changed easily.
At present,
Need thought/TODO:
We still need decision as to whether we want to 1) limit aggregate usage over all tasks' process trees or 2) limit usage per task's process tree. Believe that both of these can be implemented with the framework setup in current patch. After discussing about this with Hemanth, it came out that we need to reorganize the current code. I propose the following;
TaskTracker maintains a ProcessTree object for each task. public abstract class ProcessTree { /* Initialize the process tree */ public void initialize(); /* Destroy the process tree */ public void destroy(); /* Return total virtual memory usage by this process tree */ public long getVmem(); } My previous code(ProcessTracker: initialize, kill and getCurrentVmemUsage) would be moved to a class LinuxProcessTree that extends ProcessTree. Cygwin seems to support proc filesystem - so the same can be used for windows , need to confirm this for sure, though. For solaris/other OS we need classes extending ProcessTree. Getting pid of the task process : In the first patch, the implementation of getting pid of a process is hacky - put forward here (http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4244896 public Integer getPid(); The task itself should return its pid(pid on *NIX and cygwin PIDs on Windows) to TT perhaps by calling native code. Side notes:
Hi !
I think that there is a more general problem, that is task insulation, because a bugged process could many other things than just overloading the memory. The userBasedInsulator.sh that I proposed in [HADOOP-3675] could solve this issue (and a few others) in an easier way. We don't need [HADOOP-3675] to be complete to introduce that approch however. + ArrayList<String> vargs = new ArrayList<String>(8); + // Check for the eventual wrapper script + final String wrapper = conf.get("mapred.child.wrapper"); + if (wrapper != null) + vargs.add(wrapper); - Vector<String> vargs = new Vector<String>(8); Sorry about not stating this before my last comment, but in this JIRA I am restricting myself to writing utilities that track process trees, watch their memory usage, destroy them when overflowing etc. We definitely need the functionality of these classes, and we can use them where ever we wish to, once HADOOP-3675 moves. Will create a new JIRA for this, if that is the correct step forward.
Regarding where we want to perform the resource management - tracking disk space, memory usage etc., we can either implement it in the wrapper or in the TaskTracker itself.
Brice, your idea of doing it in the wrapper script relates to the first point above and thus has some disadvantages. It would be good, if we could just restrict the wrapper to do the most basic need of isoated work-space/chroot/jail-like mechanism and do task's resource management in TT. Comments? [Update]
Brice,
True. In our environments, we have seen processes overloading memory as an oft repeated problem. Hence, we were focussing on that.
So, the goal of this JIRA was to prevent user tasks from adversely affecting one another, or other system daemons on the node by gobbling up memory. We could not find an out-of-the-box OS solution to curtail a process and it's descendents' memory limits. Specifically, ulimit did not seem to handle processes spawned from a parent whose memory limit was set. Maybe virtual machines will help, but IMO, we are still some way off from deciding which tool is suitable for this. That is why this JIRA proposes implementing the tracking of memory on its own. If you are aware of a way this can be achieved with an OS specific mechanism, we can gladly look at that. And it would be significantly easy to use the mechanism you propose (via the wrapper script). Then we can focus on HADOOP-3675. Please do let us know of any solution that you have in mind. One impact of HADOOP-3675 on this work is that, when the mechanism to launch a task becomes pluggable, the way we monitor memory per task might need to change as well. So, for example, if we have a task-per-thread implementation of a task runner, it would be difficult to monitor memory per task because it is in the same process space, right ? In fact this proposal in the patch works only if the task is launched in a separate process.
Could we solve this by adding an extra argument specifying the JobId and the UserId to enable the script to do by job/user accounting ?
The wrapper I proposed before could solve this problem as a side effect (with /etc/security/limits.conf). But it might not be portable and your solution is maybe for this case.
I'm afraid that many functionality will not to be available for threaded tasks anyway. My next proposition will include a fallback mecanism so you should'nt have to take this in account. PS: I'm quite in hurry, please excuse me for my english :-/
I am not sure if I understand this well enough. If you meant "pass JobId/UserId to the script and do per-job/per-user accounting only", then that won't help - we need overall accounting across all tasks.
limits.conf approach is already being evaluated, it doesn't solve the current problem. See this comment on this very JIRA - https://issues.apache.org/jira/browse/HADOOP-3581?focusedCommentId=12607650#action_12607650
This looks like an interesting problem - how do we manage resource usage by each thread? Any thread resource management support in Java? What is the use-case for threaded tasks in the first place? If cost of per-taskJvm is the only reason why we want to run each task in a thread instead of a jvm, we can still achieve resource management of all tasks by forking one single jvm and running all tasks as threads of this jvm. This way we can meet our objective here too - shield hadoop from user code. Summarizing all the discussion that went so far ..
Please put forward your objections to the above proposal, if any. A couple of comments:
As I understand this the job specifies is RAM requirements as a % of I don't think a per job limit on % RAM to map / reduce works. Better E14
+1. I think that is much easier for a user to specify. Here's what I propose with respect to the configuration variables:
Thoughts ? Specifically, on the default values, is it OK to give the same amount of max memory to map tasks and reduce tasks ? Or should we look to divide the max memory so that there's more (say twice) given to the reduce tasks, than to the map tasks ? Looks good to me. I think the only constraint we should place on MAX
task sizes is that there is space for the MAX map plus the MAX reduce on the node. Some more details on the configuration items:
mapred.tasktracker.tasks.maxmemory: We can default this to a value like Long.MAX_VALUE, or maybe even -1L, to disable this feature. Only if it is different from default, will memory monitoring be done. Regarding mapred.map.task.maxmemory, one thought is whether we need separate items for map and reduce tasks, or can we just do with one item, such as mapred.task.maxmemory, which will define the maximum value that will be taken by any task (Map or Reduce) in the job. If it is typical that one type of task (say Reduce), has significantly different memory requirements than the other, then two items may be required. Comments ? Just FYI, I have summarized the entire proposal (which is split between this Jira and
Attaching patch for review. It still doesn't have test-cases and documentation.
Notes:
TODO:
Also, please comment on a bunch of other minor TODO's marked in the patch. Testing:
Attaching a new patch.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12387286/patch_3581_4.3.txt against trunk revision 681429. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 18 new Findbugs warnings. -1 release audit. The applied patch generated 218 release audit warnings (more than the trunk's current 214 warnings). -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2999/testReport/ This message is automatically generated. -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12387330/patch_3581_4.4.txt against trunk revision 681493. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. -1 release audit. The applied patch generated 218 release audit warnings (more than the trunk's current 214 warnings). -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3000/testReport/ This message is automatically generated. Cancelling patch to incorporate Devaraj's review comments offline.
Attaching modified patch.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12387547/patch_3581_5.2.txt against trunk revision 682978. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 526 javac compiler warnings (more than the trunk's current 525 warnings). -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3009/testReport/ This message is automatically generated. Attaching a new patch. This one assumes
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12388977/HADOOP-3581-final.txt against trunk revision 689380. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3121/testReport/ This message is automatically generated. The findbugs warning cannot be avoided. The warning "Hard coded reference to an absolute pathname in org.apache.hadoop.util.ProcfsBasedProcessTree.getProcessList()" refers to the absolute path "/proc", which is inevitable.
TestMiniMRDFSSort.testMapReduceSort is not related to this patch, And I can't see any contrib tests failing. Not even in the console output. Wrong report by Hudson?
Some comments:
ProcessTree:
ProcfsBasedProcessTree:
TaskMemoryManagerThread
Few minor nits:
Included all the above comments. Also removed the ProcessTree abstraction as it proved to be a premature abstraction, and is not panning out nicely while passing things like pid, sigKillInterval.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12389297/HADOOP-3581.20080901.2.txt against trunk revision 691099. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 2 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3155/testReport/ This message is automatically generated. Looks good.
Few minor comments: ProcFsBasedProcessTree
TaskMemoryMonitorThread
hadoop-default.xml
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12389342/HADOOP-3581.20080902.txt against trunk revision 691306. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 2 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3162/testReport/ This message is automatically generated. The first findBugs warning is already explained, cannot be avoided.
The second one relates to making the member class ProcessTreeInfo static private . The current class hierarchy is TaskTracker(non-static) The findBugs warnings are unavoidable. Patch is committable. Some comments:
1) fix the bug in setTaskMemoryManagerEnabled that unconditionally sets the flag taskMemoryManagerEnabled to true 2) Increase the default monitoring interval to 5 seconds. 300 msecs seems really low 3) Change processTreeInfo to processTreeInfoMap. The name is confusing. 4) Move the PID file cleanup to cleanup of Task. 5) Move the Procfs based code to a different class outside the TaskTracker. 5) The thread doing the memory monitoring is locking quite a big section of code where interaction with the OS is also involved. The same lock is also used for adding tasks to be monitored before their launch. This might negatively impact fast start of tasks. Incorporated 1 through 4.
Changes look fine. Two questions:
We actually don't need these checks, because no where is a task duplicately added to the data structures in our code. Removing these checks, and attaching a new patch.
Not needed. Cleaner thread uses a LinkedBlockingQueue which internally uses locks. -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12389552/HADOOP-3581.20080905.txt against trunk revision 692409. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3186/testReport/ This message is automatically generated. org.apache.hadoop.fs.kfs.TestKosmosFileSystem.testDirs is unrelated (
Patch still applies cleanly to trunk(after Two minor comments:
1) The javadoc in TaskLog.java has some changes which are really not required. 2) The TaskTracker.getLocalDirAllocator is not required. Just get an instance using new LocalDirAllocator("mapred.local.dir") -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12389676/HADOOP-3581.20080908.txt against trunk revision 693048. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3209/testReport/ This message is automatically generated. I just committed this. Thanks, Vinod!
Integrated in Hadoop-trunk #600 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/600/
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||