Hadoop YARN
  1. Hadoop YARN
  2. YARN-214

RMContainerImpl does not handle event EXPIRE at state RUNNING

    Details

      Description

      RMContainerImpl has a race condition where a container can enter the RUNNING state just as the container expires. This results in an invalid event transition error:

      2012-11-11 05:31:38,954 [ResourceManager Event Processor] ERROR org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: Can't handle this event at current state
      org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: EXPIRE at RUNNING
              at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
              at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
              at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
              at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:205)
              at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:44)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApp.containerCompleted(SchedulerApp.java:203)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1337)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainer(CapacityScheduler.java:739)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:659)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:80)
              at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:340)
              at java.lang.Thread.run(Thread.java:619)
      

      EXPIRE needs to be handled (well at least ignored) in the RUNNING state to account for this race condition.

      1. YARN-214.patch
        5 kB
        Jonathan Eagles
      2. YARN-214.patch
        5 kB
        Jonathan Eagles
      3. YARN-214.patch
        5 kB
        Jonathan Eagles
      4. YARN-214.patch
        5 kB
        Jonathan Eagles
      5. YARN-214.patch
        6 kB
        Jonathan Eagles

        Activity

        Thomas Graves made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Thomas Graves made changes -
        Fix Version/s 0.23.5 [ 12323311 ]
        Fix Version/s 0.23.6 [ 12323501 ]
        Robert Joseph Evans made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Fix Version/s 2.0.3-alpha [ 12323272 ]
        Fix Version/s 0.23.6 [ 12323501 ]
        Fix Version/s 3.0.0 [ 12323268 ]
        Resolution Fixed [ 1 ]
        Jonathan Eagles made changes -
        Attachment YARN-214.patch [ 12553803 ]
        Jonathan Eagles made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Jonathan Eagles made changes -
        Attachment YARN-214.patch [ 12553734 ]
        Jason Lowe made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Jonathan Eagles made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Jonathan Eagles made changes -
        Attachment YARN-214.patch [ 12553451 ]
        Jonathan Eagles made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Jonathan Eagles made changes -
        Attachment YARN-214.patch [ 12553444 ]
        Jonathan Eagles made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Jonathan Eagles made changes -
        Attachment YARN-214.patch [ 12553440 ]
        Jonathan Eagles made changes -
        Field Original Value New Value
        Assignee Jonathan Eagles [ jeagles ]
        Jason Lowe created issue -

          People

          • Assignee:
            Jonathan Eagles
            Reporter:
            Jason Lowe
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development