Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-2416

InvalidStateTransitonException in ResourceManager if AMLauncher does not receive response for startContainers() call in time

    Details

    • Hadoop Flags:
      Reviewed

      Description

      AMLauncher calls startContainers(allRequests) to launch a container for application master. Normally, the call comes back immediately so that the RMAppAttempt changes its state from ALLOCATED to LAUNCHED.

      However, we do observed that in some cases, the RPC call came back very late but the AM container was already started. Because the RMAppAttempt stuck in ALLOCATED state, once resource manager received the REGISTERED event from the application master, it threw InvalidStateTransitonException as follows.

      2014-07-05 08:59:05,021 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state
      org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: REGISTERED at ALLOCATED
      at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
      at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
      at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
      at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:652)
      at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
      at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:752)
      at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:733)
      at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
      at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
      at java.lang.Thread.run(Thread.java:744)

      For subsequent STATUS_UPDATE and CONTAINER_ALLOCATED events for this job, resource manager kept throwing InvalidStateTransitonException.

      2014-07-05 08:59:06,152 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state
      org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: STATUS_UPDATE at ALLOCATED
      at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
      at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
      at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
      at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:652)
      at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
      at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:752)
      at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:733)
      at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
      at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
      at java.lang.Thread.run(Thread.java:744)
      2014-07-05 08:59:07,779 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1404549222428_0001_02_000002 Container Transitioned from NEW to
      ALLOCATED
      2014-07-05 08:59:07,779 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state
      org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: CONTAINER_ALLOCATED at ALLOCATED
      at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
      at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
      at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
      at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:652)
      at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
      at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:752)
      at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:733)
      at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
      at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
      at java.lang.Thread.run(Thread.java:744)

        Attachments

        1. YARN-2416.003.patch
          9 kB
          Jonathan Eagles
        2. YARN-2416.002.patch
          8 kB
          Jonathan Eagles
        3. YARN-2416.001.patch
          8 kB
          Jonathan Eagles

          Activity

            People

            • Assignee:
              jeagles Jonathan Eagles
              Reporter:
              john.jian.fang Jian Fang
            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: