Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4848

TaskAttemptContext cast error during AM recovery

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.4
    • Fix Version/s: 2.0.3-alpha, 0.23.6
    • Component/s: mr-am
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Recently saw an AM that failed and tried to recover, but the subsequent attempt quickly exited with its own failure during recovery:

      2012-12-05 02:33:36,752 FATAL [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
      java.lang.ClassCastException: org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl cannot be cast to org.apache.hadoop.mapred.TaskAttemptContext
      	at org.apache.hadoop.mapred.OutputCommitter.recoverTask(OutputCommitter.java:284)
      	at org.apache.hadoop.mapreduce.v2.app.recover.RecoveryService$InterceptingEventHandler.handle(RecoveryService.java:361)
      	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$ContainerAssignedTransition.transition(TaskAttemptImpl.java:1211)
      	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$ContainerAssignedTransition.transition(TaskAttemptImpl.java:1177)
      	at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
      	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
      	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
      	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
      	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:958)
      	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
      	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:926)
      	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:918)
      	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
      	at org.apache.hadoop.mapreduce.v2.app.recover.RecoveryService$RecoveryDispatcher.realDispatch(RecoveryService.java:285)
      	at org.apache.hadoop.mapreduce.v2.app.recover.RecoveryService$RecoveryDispatcher.dispatch(RecoveryService.java:281)
      	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
      	at java.lang.Thread.run(Thread.java:619)
      2012-12-05 02:33:36,752 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..
      

      The RM then launched a third AM attempt which succeeded. The third attempt saw basically no progress after parsing the history file from the second attempt and ran the job again from scratch.

        Activity

          People

          • Assignee:
            Jerry Chen
            Reporter:
            Jason Lowe
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development