Hadoop YARN
  1. Hadoop YARN
  2. YARN-98

NM Application invalid state transition on reboot command from RM

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: nodemanager
    • Labels:
      None

      Description

      If the RM goes down and comes back up, it tells the NM to reboot. When the NM reboots, if it has any applications it aggregates the logs for those applications, then it transitions the app to APPLICATION_LOG_HANDLING_FINISHED. I saw a case where there was an app that was in the RUNNING state and tried to transition to APPLICATION_LOG_HANDLING_finished and it got the invalid transition.

      DeletionService #12012-04-11 15:12:40,476 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state
      [AsyncDispatcher event handler]org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING
      at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
      at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
      at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:382)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:517)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:509)
      at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
      at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:74)
      at java.lang.Thread.run(Thread.java:619)
      2012-04-11 15:12:40,476 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1333003059741_15999 transitioned from RUNNING to null

        Issue Links

          Activity

          Hide
          Omkar Vinit Joshi added a comment -

          Earlier this error was present. However now due to behavior change (from restart to resync) this is not reproducible.

          Bikas Saha I noticed this during nm restart. Once the containers are killed. their state is transitioned to ContainerState.EXITED_WITH_SUCCESS. Shouldn't this be ContainerState.EXITED_WITH_FAILURE??

          Show
          Omkar Vinit Joshi added a comment - Earlier this error was present. However now due to behavior change (from restart to resync) this is not reproducible. Bikas Saha I noticed this during nm restart. Once the containers are killed. their state is transitioned to ContainerState.EXITED_WITH_SUCCESS. Shouldn't this be ContainerState.EXITED_WITH_FAILURE??
          Hide
          Omkar Vinit Joshi added a comment -

          After yarn-495 fix this issue is not reproducible. Closing it as a duplicate.

          Show
          Omkar Vinit Joshi added a comment - After yarn-495 fix this issue is not reproducible. Closing it as a duplicate.

            People

            • Assignee:
              Omkar Vinit Joshi
              Reporter:
              Thomas Graves
            • Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development