Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-2283

RM failed to release the AM container

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Duplicate
    • Affects Version/s: 2.4.0
    • Fix Version/s: None
    • Component/s: resourcemanager
    • Labels:
      None
    • Environment:

      NM1: AM running
      NM2: Map task running
      mapreduce.map.maxattempts=1

      Description

      During container stability test i faced this problem

      While job is running map task got killed

      Observe that eventhough application is FAILED MRAppMaster process is running till timeout because RM did not release the AM container

      2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1405318134611_0002_01_000005 Container Transitioned from RUNNING to COMPLETED
      2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: Completed container: container_1405318134611_0002_01_000005 in state: COMPLETED event:FINISHED
      2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=testos	OPERATION=AM Released Container	TARGET=SchedulerApp	RESULT=SUCCESS	APPID=application_1405318134611_0002	CONTAINERID=container_1405318134611_0002_01_000005
      2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore: Finish information of container container_1405318134611_0002_01_000005 is written
      2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter: Stored the finish data of container container_1405318134611_0002_01_000005
      2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode: Released container container_1405318134611_0002_01_000005 of capacity <memory:1024, vCores:1> on host HOST-10-18-40-153:45026, which currently has 1 containers, <memory:2048, vCores:1> used and <memory:6144, vCores:7> available, release resources=true
      2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: default used=<memory:2048, vCores:1> numContainers=1 user=testos user-resources=<memory:2048, vCores:1>
      2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: completedContainer container=Container: [ContainerId: container_1405318134611_0002_01_000005, NodeId: HOST-10-18-40-153:45026, NodeHttpAddress: HOST-10-18-40-153:45025, Resource: <memory:1024, vCores:1>, Priority: 5, Token: Token { kind: ContainerToken, service: 10.18.40.153:45026 }, ] queue=default: capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:2048, vCores:1>, usedCapacity=0.25, absoluteUsedCapacity=0.25, numApps=1, numContainers=1 cluster=<memory:8192, vCores:8>
      2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: completedContainer queue=root usedCapacity=0.25 absoluteUsedCapacity=0.25 used=<memory:2048, vCores:1> cluster=<memory:8192, vCores:8>
      2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Re-sorting completed queue: root.default stats: default: capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:2048, vCores:1>, usedCapacity=0.25, absoluteUsedCapacity=0.25, numApps=1, numContainers=1
      2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application attempt appattempt_1405318134611_0002_000001 released container container_1405318134611_0002_01_000005 on node: host: HOST-10-18-40-153:45026 #containers=1 available=6144 used=2048 with event: FINISHED
      2014-07-14 14:43:34,924 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Updating application attempt appattempt_1405318134611_0002_000001 with final state: FINISHING
      2014-07-14 14:43:34,924 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1405318134611_0002_000001 State change from RUNNING to FINAL_SAVING
      2014-07-14 14:43:34,924 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating application application_1405318134611_0002 with final state: FINISHING
      2014-07-14 14:43:34,947 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Watcher event type: NodeDataChanged with state:SyncConnected for path:/rmstore/ZKRMStateRoot/RMAppRoot/application_1405318134611_0002/appattempt_1405318134611_0002_000001 for Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
      2014-07-14 14:43:34,947 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1405318134611_0002 State change from RUNNING to FINAL_SAVING
      2014-07-14 14:43:34,947 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing info for app: application_1405318134611_0002
      2014-07-14 14:43:34,947 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1405318134611_0002_000001 State change from FINAL_SAVING to FINISHING
      2014-07-14 14:43:35,012 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Watcher event type: NodeDataChanged with state:SyncConnected for path:/rmstore/ZKRMStateRoot/RMAppRoot/application_1405318134611_0002 for Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
      2014-07-14 14:43:35,013 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1405318134611_0002 State change from FINAL_SAVING to FINISHING
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                nishan Nishan Shetty
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: