Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-408

TestKillSubProcesses fails with assertion failure sometimes

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: 0.21.0
    • Component/s: test
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Fixed a bug in the testcase TestKillSubProcesses.

      Description

      org.apache.hadoop.mapred.TestKillSubProcesses.testJobKillFailAndSucceed fails sometimes with following error Message:

      Unexpected: The subprocess at level 3 in the subtree is not alive before Job completion
      

      Stacktrace

      junit.framework.AssertionFailedError: Unexpected: The subprocess at level 3 in the subtree is not alive before Job completion
      	at org.apache.hadoop.mapred.TestKillSubProcesses.runJobAndSetProcessHandle(TestKillSubProcesses.java:221)
      	at org.apache.hadoop.mapred.TestKillSubProcesses.runFailingJobAndValidate(TestKillSubProcesses.java:112)
      	at org.apache.hadoop.mapred.TestKillSubProcesses.runTests(TestKillSubProcesses.java:327)
      	at org.apache.hadoop.mapred.TestKillSubProcesses.testJobKillFailAndSucceed(TestKillSubProcesses.java:310)
      

      one such failure at http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/495/testReport/org.apache.hadoop.mapred/TestKillSubProcesses/testJobKillFailAndSucceed/

      1. MR-408.patch
        2 kB
        Ravi Gummadi
      2. MR-408.v1.1.patch
        7 kB
        Ravi Gummadi
      3. MR-408.v1.1.y20.patch
        8 kB
        Hemanth Yamijala
      4. MR-408.v1.patch
        7 kB
        Ravi Gummadi
      5. MR-408-yhadoop20.patch
        8 kB
        Ravi Gummadi

        Activity

        Hide
        Sreekanth Ramakrishnan added a comment -

        Is this failure related to the patch or was it found in Trunk builds?

        In following trunk builds the test case successfully passed after changes to TestKillSubProcesses were put in.

        http://hudson.zones.apache.org/hudson/view/Hadoop/job/Hadoop-trunk/864/
        http://hudson.zones.apache.org/hudson/view/Hadoop/job/Hadoop-trunk/865/
        http://hudson.zones.apache.org/hudson/view/Hadoop/job/Hadoop-trunk/866/

        Show
        Sreekanth Ramakrishnan added a comment - Is this failure related to the patch or was it found in Trunk builds? In following trunk builds the test case successfully passed after changes to TestKillSubProcesses were put in. http://hudson.zones.apache.org/hudson/view/Hadoop/job/Hadoop-trunk/864/ http://hudson.zones.apache.org/hudson/view/Hadoop/job/Hadoop-trunk/865/ http://hudson.zones.apache.org/hudson/view/Hadoop/job/Hadoop-trunk/866/
        Hide
        Amareshwari Sriramadasu added a comment -

        I have observed it in one of the patch builds, in which patch is not related to the testcase. Looks like it is a timing issue.

        Show
        Amareshwari Sriramadasu added a comment - I have observed it in one of the patch builds, in which patch is not related to the testcase. Looks like it is a timing issue.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Just a while back, even I've encountered the same issue. Unfortunately I didn't back up the logs. Will do so the next time.

        Show
        Vinod Kumar Vavilapalli added a comment - Just a while back, even I've encountered the same issue. Unfortunately I didn't back up the logs. Will do so the next time.
        Hide
        Ravi Gummadi added a comment -

        Issue is in testcase only.
        Attaching patch fixing the testcase.

        Show
        Ravi Gummadi added a comment - Issue is in testcase only. Attaching patch fixing the testcase.
        Hide
        Ravi Gummadi added a comment -

        Attaching new patch with clean up of code of test case on Vinod's offline comments.

        Show
        Ravi Gummadi added a comment - Attaching new patch with clean up of code of test case on Vinod's offline comments.
        Hide
        Ravi Gummadi added a comment -

        The issue is reproducible with trunk if we add Thread.sleep(5000) in runJobAndSetProcessHandle() before the assert statements for checking if the child processes are alive. The problem was that fs was not set in Mappers, thus signalFile creation was not checked causing the map task to finish immediately(in case of failing mapper and succeeding mapper.

        Show
        Ravi Gummadi added a comment - The issue is reproducible with trunk if we add Thread.sleep(5000) in runJobAndSetProcessHandle() before the assert statements for checking if the child processes are alive. The problem was that fs was not set in Mappers, thus signalFile creation was not checked causing the map task to finish immediately(in case of failing mapper and succeeding mapper.
        Hide
        Ravi Gummadi added a comment -

        Attaching new patch with a minor change.

        Show
        Ravi Gummadi added a comment - Attaching new patch with a minor change.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Patch looks good. +1.

        Documenting what the patch does.

        • Fixed FileSystem to be set in the mapper.
        • Cleans up various signal file/directory related variables to be done at a single place.
        • Explicitly sets test.build.data for the child using mapred.child.java.opts as test.build.data is not passed to child otherwise and in our test, child needs access to files/dirs in this temporary dir.
        Show
        Vinod Kumar Vavilapalli added a comment - Patch looks good. +1. Documenting what the patch does. Fixed FileSystem to be set in the mapper. Cleans up various signal file/directory related variables to be done at a single place. Explicitly sets test.build.data for the child using mapred.child.java.opts as test.build.data is not passed to child otherwise and in our test, child needs access to files/dirs in this temporary dir.
        Hide
        Ravi Gummadi added a comment -

        Unit tests passed on local machine.

        ant test-patch gave

        [exec] +1 overall.
        [exec]
        [exec] +1 @author. The patch does not contain any @author tags.
        [exec]
        [exec] +1 tests included. The patch appears to include 3 new or modified tests.
        [exec]
        [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
        [exec]
        [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
        [exec]
        [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
        [exec]
        [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

        Show
        Ravi Gummadi added a comment - Unit tests passed on local machine. ant test-patch gave [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12413637/MR-408.v1.1.patch
        against trunk revision 798239.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/422/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/422/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/422/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/422/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12413637/MR-408.v1.1.patch against trunk revision 798239. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/422/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/422/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/422/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/422/console This message is automatically generated.
        Hide
        Ravi Gummadi added a comment -

        Test failures are not related to the patch. All unit tests passed on my local machine.

        Show
        Ravi Gummadi added a comment - Test failures are not related to the patch. All unit tests passed on my local machine.
        Hide
        Devaraj Das added a comment -

        I just committed this. Thanks, Ravi!

        Show
        Devaraj Das added a comment - I just committed this. Thanks, Ravi!
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #38 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/38/)
        . Fixes an assertion problem in TestKillSubProcesses. Contributed by Ravi Gummadi.

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #38 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/38/ ) . Fixes an assertion problem in TestKillSubProcesses. Contributed by Ravi Gummadi.
        Hide
        Ravi Gummadi added a comment -

        This fix needs to be ported to Y! 20 distribution. Attaching patch for the same.

        Show
        Ravi Gummadi added a comment - This fix needs to be ported to Y! 20 distribution. Attaching patch for the same.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        +1 for the Y! 20 distribution patch. I could reproduce the bug on Y! distribution without the patch, and I've verified that the patch applies successfully and solves the problem with the test-case.

        Show
        Vinod Kumar Vavilapalli added a comment - +1 for the Y! 20 distribution patch. I could reproduce the bug on Y! distribution without the patch, and I've verified that the patch applies successfully and solves the problem with the test-case.
        Hide
        Hemanth Yamijala added a comment -

        More updated version of the patch for Hadoop 0.20 (Not for commit).

        Show
        Hemanth Yamijala added a comment - More updated version of the patch for Hadoop 0.20 (Not for commit).

          People

          • Assignee:
            Ravi Gummadi
            Reporter:
            Amareshwari Sriramadasu
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development