Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-5883

TaskMemoryMonitorThread might shoot down tasks even if their processes momentarily exceed the requested memory

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.20.1
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Currently the TaskMemoryMonitorThread kills tasks as soon as it detects they are consuming more memory than the max value specified. There are valid cases (see HADOOP-5059) where if a program is executed from the task, it might momentarily occupy twice the amount of memory for a short time. Ideally the monitoring thread should handle this case.

      1. HADOOP-5883.patch
        29 kB
        Hemanth Yamijala
      2. HADOOP-5883.patch
        29 kB
        Hemanth Yamijala
      3. HADOOP-5883.patch
        28 kB
        Hemanth Yamijala
      4. HADOOP-5883-20.patch
        29 kB
        Hemanth Yamijala
      5. HADOOP-5883-20.patch
        29 kB
        Hemanth Yamijala

        Issue Links

          Activity

          Hide
          yhemanth Hemanth Yamijala added a comment -

          As per some offline discussions with Eric, Arun and Vinod, we decided to follow the approach mentioned below for improving the monitoring logic:

          The particular case in HADOOP-5059 indicates that any process that executes a program from the Java code could occupy (momentarily) twice the amount of memory, due to the JVM's fork()+exec() approach. This happens during the fork() and before the exec() completes.

          • Hence, if a process tree is over twice the configured memory limit, then it is shot down immediately.
          • Also, if there are processes in the tree older than a certain interval and these processes alone contribute to being over limit, then the task is shot down.

          By corollary, this means that if there are no processes in the tree older than a certain interval and the tree is still over limit, we give it a benefit of doubt to accomodate the case in HADOOP-5059.

          Show
          yhemanth Hemanth Yamijala added a comment - As per some offline discussions with Eric, Arun and Vinod, we decided to follow the approach mentioned below for improving the monitoring logic: The particular case in HADOOP-5059 indicates that any process that executes a program from the Java code could occupy (momentarily) twice the amount of memory, due to the JVM's fork()+exec() approach. This happens during the fork() and before the exec() completes. Hence, if a process tree is over twice the configured memory limit, then it is shot down immediately. Also, if there are processes in the tree older than a certain interval and these processes alone contribute to being over limit, then the task is shot down. By corollary, this means that if there are no processes in the tree older than a certain interval and the tree is still over limit, we give it a benefit of doubt to accomodate the case in HADOOP-5059 .
          Hide
          yhemanth Hemanth Yamijala added a comment -

          The attached patch file incorporates the changes as mentioned in the earlier comment.

          The key change was to determine processes older than a certain time. To do this, the process tree class keeps track of an 'age' of the process - which is how many time the process tree has seen a process with this PID. This count is updated every time the process tree is refreshed - which is once every monitoring iteration. The monitoring thread can now ask for cumulative virtual memory of processes over a certain 'age'. For the sake of simplicity, I've assumed the monitoring interval determines how aged processes are.

          It is possible to do something more sophisticated - for e.g. we could determine the walltime of the process by making a system call. There doesn't seem to be a direct API for getting the 'walltime' of a process. One hack would be to see the created time of the pid directory in /proc and then subtract it from timeofday each time. However, it seems like this could be a costly operation, while not giving way too much more accuracy.

          Summary of the changes:

          • TaskMemoryManagerThread: changes to the logic to determine if a task is over limit.
          • ProcfsBasedProcessTree: introduces age for processes and updates them
          • The rest of the changes were a lot to do with enabling fast unit tests to be written. I think it is a good idea to move many of the tests for these two classes to use the testing mechanism I'm using. But that's the focus of another JIRA.
          • Introduced new tests for testing just these changes.

          I suppose this patch will need merging with HADOOP-5881, which is likely to go in first. I will update the patch with those changes once it's ready. This is just put up for early consumption.

          Also missing is documentation updates for the new semantics of monitoring. Again, I will finish that after the HADOOP-5881 merge.

          Show
          yhemanth Hemanth Yamijala added a comment - The attached patch file incorporates the changes as mentioned in the earlier comment. The key change was to determine processes older than a certain time. To do this, the process tree class keeps track of an 'age' of the process - which is how many time the process tree has seen a process with this PID. This count is updated every time the process tree is refreshed - which is once every monitoring iteration. The monitoring thread can now ask for cumulative virtual memory of processes over a certain 'age'. For the sake of simplicity, I've assumed the monitoring interval determines how aged processes are. It is possible to do something more sophisticated - for e.g. we could determine the walltime of the process by making a system call. There doesn't seem to be a direct API for getting the 'walltime' of a process. One hack would be to see the created time of the pid directory in /proc and then subtract it from timeofday each time. However, it seems like this could be a costly operation, while not giving way too much more accuracy. Summary of the changes: TaskMemoryManagerThread: changes to the logic to determine if a task is over limit. ProcfsBasedProcessTree: introduces age for processes and updates them The rest of the changes were a lot to do with enabling fast unit tests to be written. I think it is a good idea to move many of the tests for these two classes to use the testing mechanism I'm using. But that's the focus of another JIRA. Introduced new tests for testing just these changes. I suppose this patch will need merging with HADOOP-5881 , which is likely to go in first. I will update the patch with those changes once it's ready. This is just put up for early consumption. Also missing is documentation updates for the new semantics of monitoring. Again, I will finish that after the HADOOP-5881 merge.
          Hide
          yhemanth Hemanth Yamijala added a comment -

          Cancelling for updating with a new patch merging with HADOOP-5881.

          Show
          yhemanth Hemanth Yamijala added a comment - Cancelling for updating with a new patch merging with HADOOP-5881 .
          Hide
          yhemanth Hemanth Yamijala added a comment -

          New patch merged with HADOOP-5881. This should be applied on top of HADOOP-5881, or after it is committed.

          Show
          yhemanth Hemanth Yamijala added a comment - New patch merged with HADOOP-5881 . This should be applied on top of HADOOP-5881 , or after it is committed.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Looked at the patch. Looks very good overall. Only few comments:

          TaskMemoryMangerThread.java

          • (+292) limit != JobConf.DISABLED_MEMORY_LIMIT This check is not needed anymore as JT doesn't access any jobs with no configuration with HADOOP-5881.
          • (+207) We can avoid duplicate calls to ProcessTree.getCumulativeVmem() just before and inside ProcessTreeOverLimit(). For this, we can retain the current method for usage in testcases and add a new method that takes both the limits as parameters.

          TestProcfsBasedProcessTree.setupProcfsRootDir() :

          • Cant' we just clean the proc dir ourselves before the test?
          • We can check for the return value of mkdirs instead of making a new File.exists() call.

          Minor:

          • Missing @throws IOException in javadoc of all the methods newly added in the two testcases.
          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Looked at the patch. Looks very good overall. Only few comments: TaskMemoryMangerThread.java (+292) limit != JobConf.DISABLED_MEMORY_LIMIT This check is not needed anymore as JT doesn't access any jobs with no configuration with HADOOP-5881 . (+207) We can avoid duplicate calls to ProcessTree.getCumulativeVmem() just before and inside ProcessTreeOverLimit(). For this, we can retain the current method for usage in testcases and add a new method that takes both the limits as parameters. TestProcfsBasedProcessTree.setupProcfsRootDir() : Cant' we just clean the proc dir ourselves before the test? We can check for the return value of mkdirs instead of making a new File.exists() call. Minor: Missing @throws IOException in javadoc of all the methods newly added in the two testcases.
          Hide
          yhemanth Hemanth Yamijala added a comment -

          New patch incorporating Vinod's comments.

          Show
          yhemanth Hemanth Yamijala added a comment - New patch incorporating Vinod's comments.
          Hide
          yhemanth Hemanth Yamijala added a comment -

          Results of test-patch on the new patch.

          [exec] +1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 6 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Also, ran all unit tests. TestQueueCapacities timed out. All other test cases passed.

          Show
          yhemanth Hemanth Yamijala added a comment - Results of test-patch on the new patch. [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 6 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Also, ran all unit tests. TestQueueCapacities timed out. All other test cases passed.
          Hide
          yhemanth Hemanth Yamijala added a comment -

          Patch generated against the 20 branch.

          Show
          yhemanth Hemanth Yamijala added a comment - Patch generated against the 20 branch.
          Hide
          yhemanth Hemanth Yamijala added a comment -

          The earlier patch had a compile time error. Updating a new patch to fix it.

          Show
          yhemanth Hemanth Yamijala added a comment - The earlier patch had a compile time error. Updating a new patch to fix it.
          Hide
          yhemanth Hemanth Yamijala added a comment -

          Ran tests against the 20 branch. TestDistributedFileSystem failed, but the patch in no way touches code in this test case. All other tests passed. I will commit this fix.

          Show
          yhemanth Hemanth Yamijala added a comment - Ran tests against the 20 branch. TestDistributedFileSystem failed, but the patch in no way touches code in this test case. All other tests passed. I will commit this fix.
          Hide
          yhemanth Hemanth Yamijala added a comment -

          I just committed this to trunk and branch 0.20.

          Show
          yhemanth Hemanth Yamijala added a comment - I just committed this to trunk and branch 0.20.
          Hide
          hudson Hudson added a comment -
          Show
          hudson Hudson added a comment - Integrated in Hadoop-trunk #863 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/863/ )

            People

            • Assignee:
              yhemanth Hemanth Yamijala
              Reporter:
              yhemanth Hemanth Yamijala
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development