Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3884

App History status not updated when RMContainer transitions from RESERVED to KILLED

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • None
    • resourcemanager
    • Suse11 Sp3

    Description

      Setup
      ===============
      1 NM 3072 16 cores each

      Steps to reproduce
      ===============

      1.Submit apps to Queue 1 with 512 mb 1 core
      2.Submit apps to Queue 2 with 512 mb and 5 core

      lots of containers get reserved and unreserved in this case

      2015-07-02 20:45:31,169 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e24_1435849994778_0002_01_000013 Container Transitioned from NEW to RESERVED
      2015-07-02 20:45:31,170 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Reserved container  application=application_1435849994778_0002 resource=<memory:512, vCores:5> queue=QueueA: capacity=0.4, absoluteCapacity=0.4, usedResources=<memory:2560, vCores:21>, usedCapacity=1.6410257, absoluteUsedCapacity=0.65625, numApps=1, numContainers=5 usedCapacity=1.6410257 absoluteUsedCapacity=0.65625 used=<memory:2560, vCores:21> cluster=<memory:6144, vCores:32>
      2015-07-02 20:45:31,170 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Re-sorting assigned queue: root.QueueA stats: QueueA: capacity=0.4, absoluteCapacity=0.4, usedResources=<memory:3072, vCores:26>, usedCapacity=2.0317461, absoluteUsedCapacity=0.8125, numApps=1, numContainers=6
      2015-07-02 20:45:31,170 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: assignedContainer queue=root usedCapacity=0.96875 absoluteUsedCapacity=0.96875 used=<memory:5632, vCores:31> cluster=<memory:6144, vCores:32>
      2015-07-02 20:45:31,191 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e24_1435849994778_0001_01_000014 Container Transitioned from NEW to ALLOCATED
      2015-07-02 20:45:31,191 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   OPERATION=AM Allocated Container        TARGET=SchedulerApp     RESULT=SUCCESS  APPID=application_1435849994778_0001    CONTAINERID=container_e24_1435849994778_0001_01_000014
      2015-07-02 20:45:31,191 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Assigned container container_e24_1435849994778_0001_01_000014 of capacity <memory:512, vCores:1> on host host-10-19-92-117:64318, which has 6 containers, <memory:3072, vCores:14> used and <memory:0, vCores:2> available after allocation
      2015-07-02 20:45:31,191 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: assignedContainer application attempt=appattempt_1435849994778_0001_000001 container=Container: [ContainerId: container_e24_1435849994778_0001_01_000014, NodeId: host-10-19-92-117:64318, NodeHttpAddress: host-10-19-92-117:65321, Resource: <memory:512, vCores:1>, Priority: 20, Token: null, ] queue=default: capacity=0.2, absoluteCapacity=0.2, usedResources=<memory:2560, vCores:5>, usedCapacity=2.0846906, absoluteUsedCapacity=0.41666666, numApps=1, numContainers=5 clusterResource=<memory:6144, vCores:32>
      2015-07-02 20:45:31,191 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Re-sorting assigned queue: root.default stats: default: capacity=0.2, absoluteCapacity=0.2, usedResources=<memory:3072, vCores:6>, usedCapacity=2.5016286, absoluteUsedCapacity=0.5, numApps=1, numContainers=6
      2015-07-02 20:45:31,191 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0 absoluteUsedCapacity=1.0 used=<memory:6144, vCores:32> cluster=<memory:6144, vCores:32>
      2015-07-02 20:45:32,143 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e24_1435849994778_0001_01_000014 Container Transitioned from ALLOCATED to ACQUIRED
      2015-07-02 20:45:32,174 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Trying to fulfill reservation for application application_1435849994778_0002 on node: host-10-19-92-143:64318
      2015-07-02 20:45:32,174 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Reserved container  application=application_1435849994778_0002 resource=<memory:512, vCores:5> queue=QueueA: capacity=0.4, absoluteCapacity=0.4, usedResources=<memory:3072, vCores:26>, usedCapacity=2.0317461, absoluteUsedCapacity=0.8125, numApps=1, numContainers=6 usedCapacity=2.0317461 absoluteUsedCapacity=0.8125 used=<memory:3072, vCores:26> cluster=<memory:6144, vCores:32>
      2015-07-02 20:45:32,174 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Skipping scheduling since node host-10-19-92-143:64318 is reserved by application appattempt_1435849994778_0002_000001
      2015-07-02 20:45:32,213 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e24_1435849994778_0001_01_000014 Container Transitioned from ACQUIRED to RUNNING
      2015-07-02 20:45:32,213 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Null container completed...
      2015-07-02 20:45:33,178 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Trying to fulfill reservation for application application_1435849994778_0002 on node: host-10-19-92-143:64318
      2015-07-02 20:45:33,178 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Reserved container  application=application_1435849994778_0002 resource=<memory:512, vCores:5> queue=QueueA: capacity=0.4, absoluteCapacity=0.4, usedResources=<memory:3072, vCores:26>, usedCapacity=2.0317461, absoluteUsedCapacity=0.8125, numApps=1, numContainers=6 usedCapacity=2.0317461 absoluteUsedCapacity=0.8125 used=<memory:3072, vCores:26> cluster=<memory:6144, vCores:32>
      2015-07-02 20:45:33,178 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Skipping scheduling since node host-10-19-92-143:64318 is reserved by application appattempt_1435849994778_0002_000001
      2015-07-02 20:45:33,704 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: Application application_1435849994778_0002 unreserved  on node host: host-10-19-92-143:64318 #containers=5 available=<memory:512, vCores:3> used=<memory:2560, vCores:13>, currently has 0 at priority 20; currentReservation <memory:0, vCores:0>
      2015-07-02 20:45:33,704 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: QueueA used=<memory:2560, vCores:21> numContainers=5 user=dsperf user-resources=<memory:2560, vCores:21>
      2015-07-02 20:45:33,710 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: completedContainer container=Container: [ContainerId: container_e24_1435849994778_0002_01_000013, NodeId: host-10-19-92-143:64318, NodeHttpAddress: host-10-19-92-143:65321, Resource: <memory:512, vCores:5>, Priority: 20, Token: null, ] queue=QueueA: capacity=0.4, absoluteCapacity=0.4, usedResources=<memory:2560, vCores:21>, usedCapacity=1.6410257, absoluteUsedCapacity=0.65625, numApps=1, numContainers=5 cluster=<memory:6144, vCores:32>
      2015-07-02 20:45:33,710 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: completedContainer queue=root usedCapacity=0.9166667 absoluteUsedCapacity=0.9166667 used=<memory:5632, vCores:27> cluster=<memory:6144, vCores:32>
      2015-07-02 20:45:33,711 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Re-sorting completed queue: root.QueueA stats: QueueA: capacity=0.4, absoluteCapacity=0.4, usedResources=<memory:2560, vCores:21>, usedCapacity=1.6410257, absoluteUsedCapacity=0.65625, numApps=1, numContainers=5
      2015-07-02 20:45:33,711 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application attempt appattempt_1435849994778_0002_000001 released container container_e24_1435849994778_0002_01_000013 on node: host: host-10-19-92-143:64318 #containers=5 available=<memory:512, vCores:3> used=<memory:2560, vCores:13> with event: KILL
      
      

      Impact:

      In application history server the status get updated to -1000 (INVALID)
      but the end time not updated so Elapsed Time always changes.

      Please check the snapshot attached

      Attachments

        1. YARN-3884.0008.patch
          18 kB
          Bibin Chundatt
        2. YARN-3884.0007.patch
          18 kB
          Bibin Chundatt
        3. YARN-3884.0006.patch
          18 kB
          Bibin Chundatt
        4. YARN-3884.0005.patch
          18 kB
          Bibin Chundatt
        5. YARN-3884.0004.patch
          18 kB
          Bibin Chundatt
        6. YARN-3884.0003.patch
          22 kB
          Bibin Chundatt
        7. YARN-3884.0002.patch
          18 kB
          Bibin Chundatt
        8. Test Result-Container status.jpg
          8 kB
          Bibin Chundatt
        9. Elapsed Time.jpg
          11 kB
          Bibin Chundatt
        10. Apphistory Container Status.jpg
          13 kB
          Bibin Chundatt
        11. 0001-YARN-3884.patch
          4 kB
          Bibin Chundatt

        Issue Links

          Activity

            People

              bibinchundatt Bibin Chundatt
              bibinchundatt Bibin Chundatt
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: