Hadoop Common
  1. Hadoop Common
  2. HADOOP-5067

Failed/Killed attempts column in jobdetails.jsp does not show the number of failed/killed attempts correctly

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.19.0
    • Fix Version/s: 0.19.1
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      I see one of the task failures when i see it from the taskdetails.jsp page, but failed/killed attempts column show it as zero.

      1. patch-5067.txt
        8 kB
        Amareshwari Sriramadasu
      2. patch-5067-0.19.txt
        8 kB
        Amareshwari Sriramadasu
      3. patch-5067-1-0.19.txt
        8 kB
        Amareshwari Sriramadasu
      4. patch-5067-1.txt
        8 kB
        Amareshwari Sriramadasu

        Activity

        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #756 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/756/ )
        Hide
        Devaraj Das added a comment -

        I just committed this. Thanks, Amareshwari!

        Show
        Devaraj Das added a comment - I just committed this. Thanks, Amareshwari!
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12400077/patch-5067-1.txt
        against trunk revision 743513.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 6 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3838/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3838/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3838/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3838/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12400077/patch-5067-1.txt against trunk revision 743513. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3838/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3838/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3838/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3838/console This message is automatically generated.
        Hide
        Amareshwari Sriramadasu added a comment -

        Runtime for the testcase:
        [junit] Running org.apache.hadoop.mapred.TestLostTracker
        [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 69.852 sec

        Show
        Amareshwari Sriramadasu added a comment - Runtime for the testcase: [junit] Running org.apache.hadoop.mapred.TestLostTracker [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 69.852 sec
        Hide
        Amareshwari Sriramadasu added a comment -

        patch for trunk and 0.20

        Show
        Amareshwari Sriramadasu added a comment - patch for trunk and 0.20
        Hide
        Amareshwari Sriramadasu added a comment -

        patch for 0.19

        Show
        Amareshwari Sriramadasu added a comment - patch for 0.19
        Hide
        Amareshwari Sriramadasu added a comment -

        cancelling patch to optimize testcase

        Show
        Amareshwari Sriramadasu added a comment - cancelling patch to optimize testcase
        Hide
        Amareshwari Sriramadasu added a comment -

        I dont see much scope for refactoring in the testcase. The main methods are already refactored into UtilsForTests class.

        Show
        Amareshwari Sriramadasu added a comment - I dont see much scope for refactoring in the testcase. The main methods are already refactored into UtilsForTests class.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12399641/patch-5067.txt
        against trunk revision 741330.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 6 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3805/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3805/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3805/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3805/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12399641/patch-5067.txt against trunk revision 741330. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3805/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3805/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3805/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3805/console This message is automatically generated.
        Hide
        Amareshwari Sriramadasu added a comment -

        patch for branch 0.19

        Show
        Amareshwari Sriramadasu added a comment - patch for branch 0.19
        Hide
        Amar Kamat added a comment -

        Looks like the testcase is similar to TestJobTrackerRestartWithLostTracker. Can we factor out/reuse that code? The changes in the framework looks fine to me.

        Show
        Amar Kamat added a comment - Looks like the testcase is similar to TestJobTrackerRestartWithLostTracker . Can we factor out/reuse that code? The changes in the framework looks fine to me.
        Hide
        Amareshwari Sriramadasu added a comment -

        The patch applies to trunk and 0.20. I will upload a patch for 0.19.

        Show
        Amareshwari Sriramadasu added a comment - The patch applies to trunk and 0.20. I will upload a patch for 0.19.
        Hide
        Amareshwari Sriramadasu added a comment -

        test-patch result:

             [exec]
             [exec] +1 overall.
             [exec]
             [exec]     +1 @author.  The patch does not contain any @author tags.
             [exec]
             [exec]     +1 tests included.  The patch appears to include 6 new or modified tests.
             [exec]
             [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
             [exec]
             [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
             [exec]
             [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
             [exec]
             [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
             [exec]
             [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
             [exec]
        
        Show
        Amareshwari Sriramadasu added a comment - test-patch result: [exec] [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 6 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec]
        Hide
        Amareshwari Sriramadasu added a comment -

        Attaching patch with the fix. Also added a test-case

        Show
        Amareshwari Sriramadasu added a comment - Attaching patch with the fix. Also added a test-case
        Hide
        Amar Kamat added a comment -

        Looks like the problem is with

        boolean isPresent = this.activeTasks.remove(taskid) != null;
        

        This check is made to avoid wrong/incorrect updates for stray attempts (attempts for which there is no entry in the jobhistory). Consider a case where the attempt start and end lines are still in buffer when the jobtracker dies. In such a case the reducers might get the map completion event but the restarted jobtracker might not know about the attempt. So ideally any complaint about this map attempt should be ignored as the map will be re-executed. The idea was to update only if the TaskInProgress knows about the attempt. May be we should use tasks.keySet().contains() instead of activeTasks.remove(). Thoughts?

        Show
        Amar Kamat added a comment - Looks like the problem is with boolean isPresent = this .activeTasks.remove(taskid) != null ; This check is made to avoid wrong/incorrect updates for stray attempts (attempts for which there is no entry in the jobhistory). Consider a case where the attempt start and end lines are still in buffer when the jobtracker dies. In such a case the reducers might get the map completion event but the restarted jobtracker might not know about the attempt. So ideally any complaint about this map attempt should be ignored as the map will be re-executed. The idea was to update only if the TaskInProgress knows about the attempt. May be we should use tasks.keySet().contains() instead of activeTasks.remove() . Thoughts?
        Hide
        Amareshwari Sriramadasu added a comment -

        This got introduced in 0.19 by HADOOP-3245

        Show
        Amareshwari Sriramadasu added a comment - This got introduced in 0.19 by HADOOP-3245
        Hide
        Devaraj Das added a comment -

        I noticed that even for Lost TT, the column doesn't list the KILLED maps. Maybe the problem is generally true for COMPLETED maps that are now KILLED.

        Show
        Devaraj Das added a comment - I noticed that even for Lost TT, the column doesn't list the KILLED maps. Maybe the problem is generally true for COMPLETED maps that are now KILLED.
        Hide
        Amareshwari Sriramadasu added a comment -

        The failure was because of Too many fecth failures.

        Show
        Amareshwari Sriramadasu added a comment - The failure was because of Too many fecth failures.

          People

          • Assignee:
            Amareshwari Sriramadasu
            Reporter:
            Amareshwari Sriramadasu
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development