Attaching the patch with a little change — Now destroyProcessGroup and destroyProcess in ProcessTree will take sleeptime-before-sigkill as a parameter.
The overall summary of the patch is as follows:
(1) Using setsid when starting a task sothat all the subprocesses of the task will have the same sessionId and processGroupId as that of the java task(and java task is the process group leader).
(2) Using pidFile(earlier created only for TaskMemoryManagerThread; Now created even if memory manager is disabled) for getting the pid of the java task.
(3) Killing the whole process group if setsid was used to create the java task. Killing only the java task(similar to earlier) if setsid is not supported on the machine.
(4) Moved getPidFilePath(), removePidFile() from TaskMemoryManagerThread to TaskTracker as they are independent of TaskMemoryManagerThread.
(5) Created a new class ProcessTree with destroy() - for destroying the process tree(killing the process group if setsid is supported, killing only the java task otherwise). Also moved isAlive(pid), getPidFromPidFile() to ProcessTree as they are independent of ProcfsBasedProcessTree. destroyProcessGroup verifies if the given pid is indeed a process group leader.
(6) destroyProcessGroup() and destroyProcess() in ProcessTree class ensures that the process-tree is indeed terminated/killed by sending SIGKILL if SIGTERM is ignored. SigKillThread is moved to ProcessTree as a static inner class and made it to kill (a) single process or (b) process group based on a param. sigkill also takes a parameter whether sigkill is to be sent in the same thread OR in a separate thread(in the background).
(7) In TestProcfsBasedProcessTree, (a) using setsid to create java task, (b) using build.test.data as the testdir instead of creating shellScript and pidFile in current dir, (c) added a check that verifies if the whole the process-tree is indeed killed or not(This is done by constructing the whole subtree of processes and traversing it and checking if any of the processes is alive).
(8) Added a new testcase(with MiniMRCluster) that tests killJob of job that has tasks with children(or subtree of processes) and verifies if the subprocesses are also killed. KillMapperWIthChild is the mapper that just sleeps till it gets killed.