|
The attached patch file incorporates the changes as mentioned in the earlier comment.
The key change was to determine processes older than a certain time. To do this, the process tree class keeps track of an 'age' of the process - which is how many time the process tree has seen a process with this PID. This count is updated every time the process tree is refreshed - which is once every monitoring iteration. The monitoring thread can now ask for cumulative virtual memory of processes over a certain 'age'. For the sake of simplicity, I've assumed the monitoring interval determines how aged processes are. It is possible to do something more sophisticated - for e.g. we could determine the walltime of the process by making a system call. There doesn't seem to be a direct API for getting the 'walltime' of a process. One hack would be to see the created time of the pid directory in /proc and then subtract it from timeofday each time. However, it seems like this could be a costly operation, while not giving way too much more accuracy. Summary of the changes:
I suppose this patch will need merging with Also missing is documentation updates for the new semantics of monitoring. Again, I will finish that after the Cancelling for updating with a new patch merging with
New patch merged with
Looked at the patch. Looks very good overall. Only few comments:
TaskMemoryMangerThread.java
TestProcfsBasedProcessTree.setupProcfsRootDir() :
Minor:
New patch incorporating Vinod's comments.
Results of test-patch on the new patch.
[exec] +1 overall. Also, ran all unit tests. TestQueueCapacities timed out. All other test cases passed. Patch generated against the 20 branch.
The earlier patch had a compile time error. Updating a new patch to fix it.
Ran tests against the 20 branch. TestDistributedFileSystem failed, but the patch in no way touches code in this test case. All other tests passed. I will commit this fix.
I just committed this to trunk and branch 0.20.
Integrated in Hadoop-trunk #863 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/863/
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The particular case in HADOOP-5059 indicates that any process that executes a program from the Java code could occupy (momentarily) twice the amount of memory, due to the JVM's fork()+exec() approach. This happens during the fork() and before the exec() completes.
By corollary, this means that if there are no processes in the tree older than a certain interval and the tree is still over limit, we give it a benefit of doubt to accomodate the case in HADOOP-5059.