Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.0.3-alpha, 0.23.6
Description
Currently the historyserver is not providing counters for failed tasks, even though they are available via the AM as long as the job is still running. Those counters are lost when the client needs to redirect to the historyserver after the job completes.
Attachments
Attachments
- MAPREDUCE-4693.1.patch
- 15 kB
- Xuan Gong
- MAPREDUCE-4693.2.patch
- 14 kB
- Xuan Gong
- MAPREDUCE-4693.3.patch
- 23 kB
- Xuan Gong
- MAPREDUCE-4693.4.patch
- 23 kB
- Xuan Gong
Issue Links
- is related to
-
MAPREDUCE-4689 JobClient.getMapTaskReports on failed job results in NPE
- Closed
-
MAPREDUCE-5309 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server
- Closed
Activity
Couple of comments on the patch. Looks good, but needs some changes.
TaskAttempt20LineEventEmitter, Task20LineHistoryEventEmitter don't need to be changed - unless this change is being made in branch-1 as well.
JobBuilder should be able to handle null counters.
TaskFailedEvent and TaskAttemptUnsuccessfulCompletionEvent should store counters as
org.apache.hadoop.mapreduce.Counters, and convert to jobhistory.JhCounters only while serializing. (See MapAttemptFinishedEvent). That's to lower the AM memory overhead in case the history events processor falls behind.
Needs a unit test.
bq:TaskAttempt20LineEventEmitter, Task20LineHistoryEventEmitter don't need to be changed - unless this change is being made in branch-1 as well.
I did some changes on TaskFailedEvent and TaskAttemptUnsuccessfulCompletionEvent, add new constructer without counter parameter. Otherwise, there will be error on these two TaskAttempt20LineEventEmitter, Task20LineHistoryEventEmitter since we have added counter as new parameter at the old patch.
bq:JobBuilder should be able to handle null counters.
At the new patch, handle the null counters, use EMPTY_COUNTERS when the counters is null.
bq:TaskFailedEvent and TaskAttemptUnsuccessfulCompletionEvent should store counters as
org.apache.hadoop.mapreduce.Counters, and convert to jobhistory.JhCounters only while serializing. (See MapAttemptFinishedEvent). That's to lower the AM memory overhead in case the history events processor falls behind.
I think this has already been handled. Already made changes on Event.avro, and the TaskFailed and TaskAttemptUnsuccessfulCompletion will be automatically generated by avro, and the counters is converted to jobhistory.JhCounters while serializing.
bq:Needs a unit test.
We already have a testcase to test it, I made a simple change just make sure the counters we got back will neither be null nor be empty.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12570208/MAPREDUCE-4693.2.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 1 new or modified test files.
+1 tests included appear to have a timeout.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-tools/hadoop-rumen.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3347//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3347//console
This message is automatically generated.
... and the counters is converted to jobhistory.JhCounters while serializing.
Storing the counters as org.apache.hadoop.mapreduce.Counters is to prevent a duplicate copy of the counters till they're actually serialized.... that's the getDatum() method. (MAPREDUCE-3511)
Other than this one change and a couple of minor formatting fixes, the patch looks good.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12570511/MAPREDUCE-4693.3.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 1 new or modified test files.
+1 tests included appear to have a timeout.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-tools/hadoop-rumen.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3356//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3356//console
This message is automatically generated.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12571043/MAPREDUCE-4693.4.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 1 new or modified test files.
+1 tests included appear to have a timeout.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-tools/hadoop-rumen.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3362//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3362//console
This message is automatically generated.
1.Run the patch on a local Yarn cluster to verify if we can see the counters for failed tasks
2.Manually fail task :
attempt_1361915610508_0012_m_000009_0
attempt_1361915610508_0012_m_000009_1
attempt_1361915610508_0012_m_000009_2
attempt_1361915610508_0012_m_000009_3
3. Attached a job history log file for this job
1.Running the randomwriter example
2.Using hadoop job -fail-task <task-id> to fail the task
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12571240/job_1361915610508_0012-1361920250120-xuan-random-writer-1361920333343-9-0-FAILED-default.jhist
against trunk revision .
-1 patch. The patch command could not apply the patch.
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3372//console
This message is automatically generated.
The latest patch (MAPREDUCE-4693.4.patch) looks good. +1
There's a couple of follow up tasks which may be required though. Job level counters in case of job failure. Task.selectBestAttempt needs a fix.
Integrated in Hadoop-trunk-Commit #3392 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3392/)
MAPREDUCE-4693. Historyserver should provide counters for failed tasks. Contributed by Xuan Gong. (Revision 1450956)
Result = SUCCESS
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1450956
Files :
- /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
- /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
- /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java
- /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/avro/Events.avpr
- /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryParser.java
- /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/TaskAttemptUnsuccessfulCompletionEvent.java
- /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/TaskFailedEvent.java
- /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/TestJobHistoryParsing.java
- /hadoop/common/trunk/hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/JobBuilder.java
Integrated in Hadoop-Yarn-trunk #141 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/141/)
MAPREDUCE-4693. Historyserver should provide counters for failed tasks. Contributed by Xuan Gong. (Revision 1450956)
Result = SUCCESS
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1450956
Files :
- /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
- /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
- /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java
- /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/avro/Events.avpr
- /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryParser.java
- /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/TaskAttemptUnsuccessfulCompletionEvent.java
- /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/TaskFailedEvent.java
- /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/TestJobHistoryParsing.java
- /hadoop/common/trunk/hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/JobBuilder.java
Integrated in Hadoop-Hdfs-trunk #1330 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1330/)
MAPREDUCE-4693. Historyserver should provide counters for failed tasks. Contributed by Xuan Gong. (Revision 1450956)
Result = SUCCESS
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1450956
Files :
- /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
- /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
- /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java
- /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/avro/Events.avpr
- /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryParser.java
- /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/TaskAttemptUnsuccessfulCompletionEvent.java
- /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/TaskFailedEvent.java
- /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/TestJobHistoryParsing.java
- /hadoop/common/trunk/hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/JobBuilder.java
Integrated in Hadoop-Mapreduce-trunk #1358 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1358/)
MAPREDUCE-4693. Historyserver should provide counters for failed tasks. Contributed by Xuan Gong. (Revision 1450956)
Result = FAILURE
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1450956
Files :
- /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
- /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
- /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java
- /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/avro/Events.avpr
- /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryParser.java
- /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/TaskAttemptUnsuccessfulCompletionEvent.java
- /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/TaskFailedEvent.java
- /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/TestJobHistoryParsing.java
- /hadoop/common/trunk/hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/JobBuilder.java
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12570083/MAPREDUCE-4693.1.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-tools/hadoop-rumen.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3345//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3345//console
This message is automatically generated.