Issue Details (XML | Word | Printable)

Key: HADOOP-5067
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Amareshwari Sriramadasu
Reporter: Amareshwari Sriramadasu
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Failed/Killed attempts column in jobdetails.jsp does not show the number of failed/killed attempts correctly

Created: 16/Jan/09 08:05 AM   Updated: 08/Jul/09 04:53 PM
Return to search
Component/s: None
Affects Version/s: 0.19.0
Fix Version/s: 0.19.1

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works patch-5067-0.19.txt 2009-02-06 11:49 AM Amareshwari Sriramadasu 8 kB
Text File Licensed for inclusion in ASF works patch-5067-1-0.19.txt 2009-02-12 04:49 AM Amareshwari Sriramadasu 8 kB
Text File Licensed for inclusion in ASF works patch-5067-1.txt 2009-02-12 04:49 AM Amareshwari Sriramadasu 8 kB
Text File Licensed for inclusion in ASF works patch-5067.txt 2009-02-06 10:38 AM Amareshwari Sriramadasu 8 kB

Hadoop Flags: Reviewed
Resolution Date: 13/Feb/09 04:02 AM


 Description  « Hide
I see one of the task failures when i see it from the taskdetails.jsp page, but failed/killed attempts column show it as zero.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Amareshwari Sriramadasu added a comment - 16/Jan/09 09:18 AM
The failure was because of Too many fecth failures.

Devaraj Das added a comment - 06/Feb/09 06:43 AM
I noticed that even for Lost TT, the column doesn't list the KILLED maps. Maybe the problem is generally true for COMPLETED maps that are now KILLED.

Amareshwari Sriramadasu added a comment - 06/Feb/09 07:29 AM
This got introduced in 0.19 by HADOOP-3245

Amar Kamat added a comment - 06/Feb/09 08:06 AM
Looks like the problem is with
boolean isPresent = this.activeTasks.remove(taskid) != null;

This check is made to avoid wrong/incorrect updates for stray attempts (attempts for which there is no entry in the jobhistory). Consider a case where the attempt start and end lines are still in buffer when the jobtracker dies. In such a case the reducers might get the map completion event but the restarted jobtracker might not know about the attempt. So ideally any complaint about this map attempt should be ignored as the map will be re-executed. The idea was to update only if the TaskInProgress knows about the attempt. May be we should use tasks.keySet().contains() instead of activeTasks.remove(). Thoughts?


Amareshwari Sriramadasu added a comment - 06/Feb/09 10:38 AM
Attaching patch with the fix. Also added a test-case

Amareshwari Sriramadasu added a comment - 06/Feb/09 10:38 AM
test-patch result:
     [exec]
     [exec] +1 overall.
     [exec]
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec]
     [exec]     +1 tests included.  The patch appears to include 6 new or modified tests.
     [exec]
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec]
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec]
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec]
     [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
     [exec]
     [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
     [exec]

Amareshwari Sriramadasu added a comment - 06/Feb/09 11:37 AM
The patch applies to trunk and 0.20. I will upload a patch for 0.19.

Amar Kamat added a comment - 06/Feb/09 11:46 AM
Looks like the testcase is similar to TestJobTrackerRestartWithLostTracker. Can we factor out/reuse that code? The changes in the framework looks fine to me.

Amareshwari Sriramadasu added a comment - 06/Feb/09 11:49 AM
patch for branch 0.19

Hadoop QA added a comment - 06/Feb/09 02:53 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12399641/patch-5067.txt
against trunk revision 741330.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 6 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 Eclipse classpath. The patch retains Eclipse classpath integrity.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 core tests. The patch passed core unit tests.

-1 contrib tests. The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3805/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3805/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3805/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3805/console

This message is automatically generated.


Amareshwari Sriramadasu added a comment - 10/Feb/09 04:57 AM
I dont see much scope for refactoring in the testcase. The main methods are already refactored into UtilsForTests class.

Amareshwari Sriramadasu added a comment - 12/Feb/09 04:43 AM
cancelling patch to optimize testcase

Amareshwari Sriramadasu added a comment - 12/Feb/09 04:49 AM
patch for 0.19

Amareshwari Sriramadasu added a comment - 12/Feb/09 04:49 AM
patch for trunk and 0.20

Amareshwari Sriramadasu added a comment - 12/Feb/09 04:50 AM
Runtime for the testcase:
[junit] Running org.apache.hadoop.mapred.TestLostTracker
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 69.852 sec

Hadoop QA added a comment - 12/Feb/09 12:56 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12400077/patch-5067-1.txt
against trunk revision 743513.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 6 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 Eclipse classpath. The patch retains Eclipse classpath integrity.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 core tests. The patch passed core unit tests.

-1 contrib tests. The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3838/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3838/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3838/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3838/console

This message is automatically generated.


Devaraj Das added a comment - 13/Feb/09 04:02 AM
I just committed this. Thanks, Amareshwari!

Hudson added a comment - 16/Feb/09 05:00 PM