Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4848

TaskAttemptContext cast error during AM recovery

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.4
    • Fix Version/s: 2.0.3-alpha, 0.23.6
    • Component/s: mr-am
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Recently saw an AM that failed and tried to recover, but the subsequent attempt quickly exited with its own failure during recovery:

      2012-12-05 02:33:36,752 FATAL [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
      java.lang.ClassCastException: org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl cannot be cast to org.apache.hadoop.mapred.TaskAttemptContext
      	at org.apache.hadoop.mapred.OutputCommitter.recoverTask(OutputCommitter.java:284)
      	at org.apache.hadoop.mapreduce.v2.app.recover.RecoveryService$InterceptingEventHandler.handle(RecoveryService.java:361)
      	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$ContainerAssignedTransition.transition(TaskAttemptImpl.java:1211)
      	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$ContainerAssignedTransition.transition(TaskAttemptImpl.java:1177)
      	at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
      	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
      	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
      	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
      	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:958)
      	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
      	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:926)
      	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:918)
      	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
      	at org.apache.hadoop.mapreduce.v2.app.recover.RecoveryService$RecoveryDispatcher.realDispatch(RecoveryService.java:285)
      	at org.apache.hadoop.mapreduce.v2.app.recover.RecoveryService$RecoveryDispatcher.dispatch(RecoveryService.java:281)
      	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
      	at java.lang.Thread.run(Thread.java:619)
      2012-12-05 02:33:36,752 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..
      

      The RM then launched a third AM attempt which succeeded. The third attempt saw basically no progress after parsing the history file from the second attempt and ran the job again from scratch.

        Activity

        Hide
        Jerry Chen added a comment -

        Hi Jason, I looked into this problem and this is a bug in RecoveryService of MRv2. The cause is that the RecoveryService didn't consider the commiter type (new api commiter or old api commiter).
        I can submit a patch to this issue soon.

        Show
        Jerry Chen added a comment - Hi Jason, I looked into this problem and this is a bug in RecoveryService of MRv2. The cause is that the RecoveryService didn't consider the commiter type (new api commiter or old api commiter). I can submit a patch to this issue soon.
        Hide
        Jerry Chen added a comment -

        Patch attached.

        Show
        Jerry Chen added a comment - Patch attached.
        Hide
        Jerry Chen added a comment -

        Patch submitted. Please kindly help review.

        Show
        Jerry Chen added a comment - Patch submitted. Please kindly help review.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12563716/MAPREDUCE-4848.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3208//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3208//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12563716/MAPREDUCE-4848.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3208//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3208//console This message is automatically generated.
        Hide
        Jason Lowe added a comment -

        +1 lgtm.

        Show
        Jason Lowe added a comment - +1 lgtm.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-trunk-Commit #3208 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3208/)
        MAPREDUCE-4848. TaskAttemptContext cast error during AM recovery. Contributed by Jerry Chen (Revision 1431131)

        Result = SUCCESS
        jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1431131
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java
        Show
        Hudson added a comment - Integrated in Hadoop-trunk-Commit #3208 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3208/ ) MAPREDUCE-4848 . TaskAttemptContext cast error during AM recovery. Contributed by Jerry Chen (Revision 1431131) Result = SUCCESS jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1431131 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java
        Hide
        Jason Lowe added a comment -

        Thanks, Jerry! I committed this to trunk, branch-2, and branch-0.23.

        Note that in the future, the "Fix Version/s" field is reserved for recording where the issue was actually fixed and should only be set when the fix is checked in. The "Target Version/s" field is better suited for indicating where a patch should be integrated if accepted.

        Show
        Jason Lowe added a comment - Thanks, Jerry! I committed this to trunk, branch-2, and branch-0.23. Note that in the future, the "Fix Version/s" field is reserved for recording where the issue was actually fixed and should only be set when the fix is checked in. The "Target Version/s" field is better suited for indicating where a patch should be integrated if accepted.
        Hide
        Jerry Chen added a comment -

        Thansk for your comment and help checked in, Jason. I will follow the these guidelines for future work.

        Show
        Jerry Chen added a comment - Thansk for your comment and help checked in, Jason. I will follow the these guidelines for future work.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Yarn-trunk #92 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/92/)
        MAPREDUCE-4848. TaskAttemptContext cast error during AM recovery. Contributed by Jerry Chen (Revision 1431131)

        Result = SUCCESS
        jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1431131
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java
        Show
        Hudson added a comment - Integrated in Hadoop-Yarn-trunk #92 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/92/ ) MAPREDUCE-4848 . TaskAttemptContext cast error during AM recovery. Contributed by Jerry Chen (Revision 1431131) Result = SUCCESS jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1431131 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-0.23-Build #490 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/490/)
        svn merge -c 1431131 FIXES: MAPREDUCE-4848. TaskAttemptContext cast error during AM recovery. Contributed by Jerry Chen (Revision 1431137)

        Result = FAILURE
        jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1431137
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #490 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/490/ ) svn merge -c 1431131 FIXES: MAPREDUCE-4848 . TaskAttemptContext cast error during AM recovery. Contributed by Jerry Chen (Revision 1431137) Result = FAILURE jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1431137 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1309 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1309/)
        MAPREDUCE-4848. TaskAttemptContext cast error during AM recovery. Contributed by Jerry Chen (Revision 1431131)

        Result = FAILURE
        jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1431131
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1309 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1309/ ) MAPREDUCE-4848 . TaskAttemptContext cast error during AM recovery. Contributed by Jerry Chen (Revision 1431131) Result = FAILURE jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1431131 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #1281 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1281/)
        MAPREDUCE-4848. TaskAttemptContext cast error during AM recovery. Contributed by Jerry Chen (Revision 1431131)

        Result = FAILURE
        jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1431131
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1281 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1281/ ) MAPREDUCE-4848 . TaskAttemptContext cast error during AM recovery. Contributed by Jerry Chen (Revision 1431131) Result = FAILURE jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1431131 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java

          People

          • Assignee:
            Jerry Chen
            Reporter:
            Jason Lowe
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development