Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: 0.21.0
    • Component/s: tasktracker
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Fixes an issue of NPE in ProcfsBasedProcessTree in a corner case.

      Description

      This causes the following exception in TaskMemoryManagerThread. I observed this while running TestTaskTrackerMemoryManager.

      2009-09-02 12:08:25,835 WARN  mapred.TaskMemoryManagerThread (TaskMemoryManagerThread.java:run(239)) - \
                  Uncaught exception in TaskMemoryManager while managing memory of attempt_20090902120812252_0001_m_000003_0 : \
      java.lang.NullPointerException
              at org.apache.hadoop.util.ProcfsBasedProcessTree.assertPidPgrpidForMatch(ProcfsBasedProcessTree.java:234)
              at org.apache.hadoop.util.ProcfsBasedProcessTree.assertAndDestroyProcessGroup(ProcfsBasedProcessTree.java:257)
              at org.apache.hadoop.util.ProcfsBasedProcessTree.destroy(ProcfsBasedProcessTree.java:286)
              at org.apache.hadoop.mapred.TaskMemoryManagerThread.run(TaskMemoryManagerThread.java:229)
      
      1. MR-962.v1.patch
        4 kB
        Ravi Gummadi
      2. MR-962.v1.1.patch
        4 kB
        Ravi Gummadi
      3. MR-962.patch
        4 kB
        Ravi Gummadi
      4. HADOOP-6232.patch
        1 kB
        Ravi Gummadi

        Activity

        Hide
        Vinod Kumar Vavilapalli added a comment -

        This is mostly a timing issue and happens when memory manager tries to destroy a process that is just gone. It didn't affect the testcase. The memory manager code doesn't propagate failures across its processing of multiple tasks. The side-effects seem to be mostly negligible. As we try to remove a task entry from the processTreeInfoMap map only after destroy succeeds. I think a task entry will be left in the map, but as we enough null checks in place, this process will just be skipped in further iterations.

        Show
        Vinod Kumar Vavilapalli added a comment - This is mostly a timing issue and happens when memory manager tries to destroy a process that is just gone. It didn't affect the testcase. The memory manager code doesn't propagate failures across its processing of multiple tasks. The side-effects seem to be mostly negligible. As we try to remove a task entry from the processTreeInfoMap map only after destroy succeeds. I think a task entry will be left in the map, but as we enough null checks in place, this process will just be skipped in further iterations.
        Hide
        Ravi Gummadi added a comment -

        Even though the session leader is gone, the child processes in that session can still be there and they need to be killed.

        Show
        Ravi Gummadi added a comment - Even though the session leader is gone, the child processes in that session can still be there and they need to be killed.
        Hide
        Ravi Gummadi added a comment -

        Attaching patch that fixes the issue.

        Please review and provide your comments.

        Show
        Ravi Gummadi added a comment - Attaching patch that fixes the issue. Please review and provide your comments.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Few comments:

        • HADOOP-6230 is already committed, we need a new patch that applies to mapred project for this issue.
        • I think it shouldn't be very difficult to directly test assertAndDestroyProcessGroup()/assertPidPgrpidForMatch()
          by using the concept mock processes which is already being used in TestProcfsBasedProcessTree.
        • Minor:
          • Fix the new comments/log statements introduced in this patch. They can be bettered/made more correct.
          • Wrap around lines longer than 80 characters.

        I've corrected the patch myself w.r.t HADOOP-6230 and can confirm that the NPE messages no longer appear in
        TestTaskTrackerMemoryManager with this patch.

        Show
        Vinod Kumar Vavilapalli added a comment - Few comments: HADOOP-6230 is already committed, we need a new patch that applies to mapred project for this issue. I think it shouldn't be very difficult to directly test assertAndDestroyProcessGroup()/assertPidPgrpidForMatch() by using the concept mock processes which is already being used in TestProcfsBasedProcessTree. Minor: Fix the new comments/log statements introduced in this patch. They can be bettered/made more correct. Wrap around lines longer than 80 characters. I've corrected the patch myself w.r.t HADOOP-6230 and can confirm that the NPE messages no longer appear in TestTaskTrackerMemoryManager with this patch.
        Hide
        Ravi Gummadi added a comment -

        Attaching patch for MapReduce and added testcase.

        Please review and provide your comments.

        Show
        Ravi Gummadi added a comment - Attaching patch for MapReduce and added testcase. Please review and provide your comments.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Patch looks good and is well tested. +1. Running it through Hudson.

        Show
        Vinod Kumar Vavilapalli added a comment - Patch looks good and is well tested. +1. Running it through Hudson.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12419293/MR-962.patch
        against trunk revision 819740.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 1 new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/140/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/140/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/140/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/140/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12419293/MR-962.patch against trunk revision 819740. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/140/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/140/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/140/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/140/console This message is automatically generated.
        Hide
        Ravi Gummadi added a comment -

        Attaching new patch removing the unused method constructProcessInfo(ProcessInfo).

        Show
        Ravi Gummadi added a comment - Attaching new patch removing the unused method constructProcessInfo(ProcessInfo).
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12421380/MR-962.v1.patch
        against trunk revision 819740.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/142/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/142/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/142/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/142/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12421380/MR-962.v1.patch against trunk revision 819740. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/142/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/142/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/142/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/142/console This message is automatically generated.
        Hide
        Ravi Gummadi added a comment -

        core test failure is a known issue and is not related to this patch.

        Show
        Ravi Gummadi added a comment - core test failure is a known issue and is not related to this patch.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        The patch is ready for commit. Can someone commit this?

        @Ravi, just before commit, can you run it through Hudson once more so we are sure? Thanks.

        Show
        Vinod Kumar Vavilapalli added a comment - The patch is ready for commit. Can someone commit this? @Ravi, just before commit, can you run it through Hudson once more so we are sure? Thanks.
        Hide
        Ravi Gummadi added a comment -

        Just allowing Hudson to validate the patch with current trunk....

        Show
        Ravi Gummadi added a comment - Just allowing Hudson to validate the patch with current trunk....
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12421380/MR-962.v1.patch
        against trunk revision 828979.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/203/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/203/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/203/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/203/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12421380/MR-962.v1.patch against trunk revision 828979. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/203/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/203/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/203/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/203/console This message is automatically generated.
        Hide
        Hemanth Yamijala added a comment -

        Apologies for looking at this late. The patch looks fine overall. One minor nit is that the test case testDestroyProcessTree is initializing an array procInfos without any need for it. Can this be removed and run through Hudson again - so I can commit it ?

        Show
        Hemanth Yamijala added a comment - Apologies for looking at this late. The patch looks fine overall. One minor nit is that the test case testDestroyProcessTree is initializing an array procInfos without any need for it. Can this be removed and run through Hudson again - so I can commit it ?
        Hide
        Ravi Gummadi added a comment -

        Attaching patch by removing the unnecessary array.

        Show
        Ravi Gummadi added a comment - Attaching patch by removing the unnecessary array.
        Hide
        Ravi Gummadi added a comment -

        Allowing to go through Hudson....

        Show
        Ravi Gummadi added a comment - Allowing to go through Hudson....
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12423793/MR-962.v1.1.patch
        against trunk revision 831816.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/118/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/118/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/118/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/118/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12423793/MR-962.v1.1.patch against trunk revision 831816. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/118/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/118/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/118/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/118/console This message is automatically generated.
        Hide
        Hemanth Yamijala added a comment -

        I just committed this to trunk and branch 0.21. Thanks, Ravi !

        Show
        Hemanth Yamijala added a comment - I just committed this to trunk and branch 0.21. Thanks, Ravi !
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #109 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/109/)
        . Fix a NullPointerException while killing task process trees. Contributed by Ravi Gummadi.

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #109 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/109/ ) . Fix a NullPointerException while killing task process trees. Contributed by Ravi Gummadi.

          People

          • Assignee:
            Ravi Gummadi
            Reporter:
            Vinod Kumar Vavilapalli
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development