Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4457

mr job invalid transition TA_TOO_MANY_FETCH_FAILURE at FAILED

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 0.23.3
    • Fix Version/s: 0.23.3, 2.0.2-alpha
    • Component/s: mrv2
    • Labels:
      None

      Description

      we saw a job go into the ERROR state from an invalid state transition.

      3,600 INFO [AsyncDispatcher event handler]
      org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
      attempt_1342238829791_2501_m_007743_0 TaskAttempt Transitioned from SUCCEEDED
      to FAILED
      2012-07-16 08:49:53,600 INFO [AsyncDispatcher event handler]
      org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
      attempt_1342238829791_2501_m_008850_0 TaskAttempt Transitioned from SUCCEEDED
      to FAILED
      2012-07-16 08:49:53,600 INFO [AsyncDispatcher event handler]
      org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
      attempt_1342238829791_2501_m_017344_1000 TaskAttempt Transitioned from RUNNING
      to SUCCESS_CONTAINER_CLEANUP
      2012-07-16 08:49:53,601 ERROR [AsyncDispatcher event handler]
      org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this
      event at current state for attempt_1342238829791_2501_m_000027_0
      org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event:
      TA_TOO_MANY_FETCH_FAILURE at FAILED
      at
      org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
      at
      org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
      at
      org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
      at
      org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:954)
      at
      org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:133)
      at
      org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:913)
      at
      org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:905)
      at
      org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
      at
      org.apache.hadoop.mapreduce.v2.app.recover.RecoveryService$RecoveryDispatcher.realDispatch(RecoveryService.java:285)
      at
      org.apache.hadoop.mapreduce.v2.app.recover.RecoveryService$RecoveryDispatcher.dispatch(RecoveryService.java:281)
      at
      org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
      at java.lang.Thread.run(Thread.java:619)
      2012-07-16 08:49:53,601 INFO [AsyncDispatcher event handler]
      org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
      attempt_1342238829791_2501_m_029091_1000 TaskAttempt Transitioned from RUNNING
      to SUCCESS_CONTAINER_CLEANUP
      2012-07-16 08:49:53,601 INFO [IPC Server handler 17 on 47153]
      org.apache.hadoop.mapred.TaskAttemptListenerImpl: Status update from
      attempt_1342238829791_2501_r_000461_1000

      It looks like we possibly got 2 TA_TOO_MANY_FETCH_FAILURE events. The first one moved it to FAILED and then the second one failed because no valid transition.

        Attachments

        1. MR-4457.txt
          6 kB
          Robert Joseph Evans

          Issue Links

            Activity

              People

              • Assignee:
                revans2 Robert Joseph Evans
                Reporter:
                tgraves Thomas Graves
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: