Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4457

mr job invalid transition TA_TOO_MANY_FETCH_FAILURE at FAILED

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 0.23.3
    • 0.23.3, 2.0.2-alpha
    • mrv2
    • None

    Description

      we saw a job go into the ERROR state from an invalid state transition.

      3,600 INFO [AsyncDispatcher event handler]
      org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
      attempt_1342238829791_2501_m_007743_0 TaskAttempt Transitioned from SUCCEEDED
      to FAILED
      2012-07-16 08:49:53,600 INFO [AsyncDispatcher event handler]
      org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
      attempt_1342238829791_2501_m_008850_0 TaskAttempt Transitioned from SUCCEEDED
      to FAILED
      2012-07-16 08:49:53,600 INFO [AsyncDispatcher event handler]
      org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
      attempt_1342238829791_2501_m_017344_1000 TaskAttempt Transitioned from RUNNING
      to SUCCESS_CONTAINER_CLEANUP
      2012-07-16 08:49:53,601 ERROR [AsyncDispatcher event handler]
      org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this
      event at current state for attempt_1342238829791_2501_m_000027_0
      org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event:
      TA_TOO_MANY_FETCH_FAILURE at FAILED
      at
      org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
      at
      org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
      at
      org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
      at
      org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:954)
      at
      org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:133)
      at
      org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:913)
      at
      org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:905)
      at
      org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
      at
      org.apache.hadoop.mapreduce.v2.app.recover.RecoveryService$RecoveryDispatcher.realDispatch(RecoveryService.java:285)
      at
      org.apache.hadoop.mapreduce.v2.app.recover.RecoveryService$RecoveryDispatcher.dispatch(RecoveryService.java:281)
      at
      org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
      at java.lang.Thread.run(Thread.java:619)
      2012-07-16 08:49:53,601 INFO [AsyncDispatcher event handler]
      org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
      attempt_1342238829791_2501_m_029091_1000 TaskAttempt Transitioned from RUNNING
      to SUCCESS_CONTAINER_CLEANUP
      2012-07-16 08:49:53,601 INFO [IPC Server handler 17 on 47153]
      org.apache.hadoop.mapred.TaskAttemptListenerImpl: Status update from
      attempt_1342238829791_2501_r_000461_1000

      It looks like we possibly got 2 TA_TOO_MANY_FETCH_FAILURE events. The first one moved it to FAILED and then the second one failed because no valid transition.

      Attachments

        1. MR-4457.txt
          6 kB
          Robert Joseph Evans

        Issue Links

          Activity

            People

              revans2 Robert Joseph Evans
              tgraves Thomas Graves
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: