Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-2492 (Clone of YARN-796) Allow for (admin) labels on nodes and resource-requests
  3. YARN-4140

RM container allocation delayed incase of app submitted to Nodelabel partition

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.8.0, 2.7.4, 3.0.0-alpha1
    • scheduler
    • None
    • Reviewed

    Description

      Trying to run application on Nodelabel partition I found that the application execution time is delayed by 5 – 10 min for 500 containers . Total 3 machines 2 machines were in same partition and app submitted to same.

      After enabling debug was able to find the below

      1. From AM the container ask is for OFF-SWITCH
      2. RM allocating all containers to NODE_LOCAL as shown in logs below.
      3. So since I was having about 500 containers time taken was about – 6 minutes to allocate 1st map after AM allocation.
      4. Tested with about 1K maps using PI job took 17 minutes to allocate next container after AM allocation

      Once 500 container allocation on NODE_LOCAL is done the next container allocation is done on OFF_SWITCH

      2015-09-09 15:21:58,954 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: showRequests: application=application_1441791998224_0001 request={Priority: 20, Capability: <memory:512, vCores:1>, # Containers: 500, Location: /default-rack, Relax Locality: true, Node Label Expression: }
      
      2015-09-09 15:21:58,954 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: showRequests: application=application_1441791998224_0001 request={Priority: 20, Capability: <memory:512, vCores:1>, # Containers: 500, Location: *, Relax Locality: true, Node Label Expression: 3}
      
      2015-09-09 15:21:58,954 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: showRequests: application=application_1441791998224_0001 request={Priority: 20, Capability: <memory:512, vCores:1>, # Containers: 500, Location: host-10-19-92-143, Relax Locality: true, Node Label Expression: }
      
      2015-09-09 15:21:58,954 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: showRequests: application=application_1441791998224_0001 request={Priority: 20, Capability: <memory:512, vCores:1>, # Containers: 500, Location: host-10-19-92-117, Relax Locality: true, Node Label Expression: }
      
      2015-09-09 15:21:58,954 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, usedResources=<memory:0, vCores:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> <memory:0, vCores:0>, NODE_LOCAL
      
      2015-09-09 14:35:45,467 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, usedResources=<memory:0, vCores:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> <memory:0, vCores:0>, NODE_LOCAL
      
      2015-09-09 14:35:45,831 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, usedResources=<memory:0, vCores:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> <memory:0, vCores:0>, NODE_LOCAL
      
      2015-09-09 14:35:46,469 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, usedResources=<memory:0, vCores:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> <memory:0, vCores:0>, NODE_LOCAL
      
      2015-09-09 14:35:46,832 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, usedResources=<memory:0, vCores:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> <memory:0, vCores:0>, NODE_LOCAL
      
      
      dsperf@host-127:/opt/bibin/dsperf/HAINSTALL/install/hadoop/resourcemanager/logs1> cat hadoop-dsperf-resourcemanager-host-127.log | grep "NODE_LOCAL" | grep "root.b.b1" | wc -l
      
      500
      

      (Consumes about 6 minutes)

      Attachments

        1. 0001-YARN-4140.patch
          3 kB
          Bibin Chundatt
        2. 0002-YARN-4140.patch
          3 kB
          Bibin Chundatt
        3. 0003-YARN-4140.patch
          7 kB
          Bibin Chundatt
        4. 0004-YARN-4140.patch
          10 kB
          Bibin Chundatt
        5. 0005-YARN-4140.patch
          8 kB
          Bibin Chundatt
        6. 0006-YARN-4140.patch
          8 kB
          Bibin Chundatt
        7. 0007-YARN-4140.patch
          9 kB
          Bibin Chundatt
        8. 0008-YARN-4140.patch
          12 kB
          Bibin Chundatt
        9. 0009-YARN-4140.patch
          13 kB
          Bibin Chundatt
        10. 0010-YARN-4140.patch
          12 kB
          Bibin Chundatt
        11. 0011-YARN-4140.patch
          12 kB
          Bibin Chundatt
        12. 0012-YARN-4140.patch
          13 kB
          Bibin Chundatt
        13. 0013-YARN-4140.patch
          13 kB
          Bibin Chundatt
        14. 0014-YARN-4140.patch
          17 kB
          Bibin Chundatt
        15. YARN-4140-branch-2.7.001.patch
          18 kB
          Jonathan Hung
        16. YARN-4140-branch-2.7.002.patch
          18 kB
          Jonathan Hung
        17. YARN-4140-branch-2.7.002-YARN-4250.patch
          18 kB
          Brahma Reddy Battula

        Issue Links

          Activity

            People

              bibinchundatt Bibin Chundatt
              bibinchundatt Bibin Chundatt
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: