Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-98

NM Application invalid state transition on reboot command from RM

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: nodemanager
    • Labels:
      None

      Description

      If the RM goes down and comes back up, it tells the NM to reboot. When the NM reboots, if it has any applications it aggregates the logs for those applications, then it transitions the app to APPLICATION_LOG_HANDLING_FINISHED. I saw a case where there was an app that was in the RUNNING state and tried to transition to APPLICATION_LOG_HANDLING_finished and it got the invalid transition.

      DeletionService #12012-04-11 15:12:40,476 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state
      [AsyncDispatcher event handler]org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING
      at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
      at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
      at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:382)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:517)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:509)
      at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
      at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:74)
      at java.lang.Thread.run(Thread.java:619)
      2012-04-11 15:12:40,476 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1333003059741_15999 transitioned from RUNNING to null

        Issue Links

          Activity

          Hide
          ojoshi Omkar Vinit Joshi added a comment -

          Earlier this error was present. However now due to behavior change (from restart to resync) this is not reproducible.

          Bikas Saha I noticed this during nm restart. Once the containers are killed. their state is transitioned to ContainerState.EXITED_WITH_SUCCESS. Shouldn't this be ContainerState.EXITED_WITH_FAILURE??

          Show
          ojoshi Omkar Vinit Joshi added a comment - Earlier this error was present. However now due to behavior change (from restart to resync) this is not reproducible. Bikas Saha I noticed this during nm restart. Once the containers are killed. their state is transitioned to ContainerState.EXITED_WITH_SUCCESS. Shouldn't this be ContainerState.EXITED_WITH_FAILURE??
          Hide
          ojoshi Omkar Vinit Joshi added a comment -

          After yarn-495 fix this issue is not reproducible. Closing it as a duplicate.

          Show
          ojoshi Omkar Vinit Joshi added a comment - After yarn-495 fix this issue is not reproducible. Closing it as a duplicate.

            People

            • Assignee:
              ojoshi Omkar Vinit Joshi
              Reporter:
              tgraves Thomas Graves
            • Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development