Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
Reviewed
Description
Trying to run application on Nodelabel partition I found that the application execution time is delayed by 5 – 10 min for 500 containers . Total 3 machines 2 machines were in same partition and app submitted to same.
After enabling debug was able to find the below
- From AM the container ask is for OFF-SWITCH
- RM allocating all containers to NODE_LOCAL as shown in logs below.
- So since I was having about 500 containers time taken was about – 6 minutes to allocate 1st map after AM allocation.
- Tested with about 1K maps using PI job took 17 minutes to allocate next container after AM allocation
Once 500 container allocation on NODE_LOCAL is done the next container allocation is done on OFF_SWITCH
2015-09-09 15:21:58,954 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: showRequests: application=application_1441791998224_0001 request={Priority: 20, Capability: <memory:512, vCores:1>, # Containers: 500, Location: /default-rack, Relax Locality: true, Node Label Expression: } 2015-09-09 15:21:58,954 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: showRequests: application=application_1441791998224_0001 request={Priority: 20, Capability: <memory:512, vCores:1>, # Containers: 500, Location: *, Relax Locality: true, Node Label Expression: 3} 2015-09-09 15:21:58,954 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: showRequests: application=application_1441791998224_0001 request={Priority: 20, Capability: <memory:512, vCores:1>, # Containers: 500, Location: host-10-19-92-143, Relax Locality: true, Node Label Expression: } 2015-09-09 15:21:58,954 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: showRequests: application=application_1441791998224_0001 request={Priority: 20, Capability: <memory:512, vCores:1>, # Containers: 500, Location: host-10-19-92-117, Relax Locality: true, Node Label Expression: } 2015-09-09 15:21:58,954 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, usedResources=<memory:0, vCores:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> <memory:0, vCores:0>, NODE_LOCAL
2015-09-09 14:35:45,467 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, usedResources=<memory:0, vCores:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> <memory:0, vCores:0>, NODE_LOCAL 2015-09-09 14:35:45,831 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, usedResources=<memory:0, vCores:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> <memory:0, vCores:0>, NODE_LOCAL 2015-09-09 14:35:46,469 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, usedResources=<memory:0, vCores:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> <memory:0, vCores:0>, NODE_LOCAL 2015-09-09 14:35:46,832 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, usedResources=<memory:0, vCores:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> <memory:0, vCores:0>, NODE_LOCAL
dsperf@host-127:/opt/bibin/dsperf/HAINSTALL/install/hadoop/resourcemanager/logs1> cat hadoop-dsperf-resourcemanager-host-127.log | grep "NODE_LOCAL" | grep "root.b.b1" | wc -l 500
(Consumes about 6 minutes)
Attachments
Attachments
Issue Links
- breaks
-
MAPREDUCE-6510 TestRMContainerAllocator is failing
- Resolved
-
YARN-4250 NPE in AppSchedulingInfo#isRequestLabelChanged
- Resolved
- relates to
-
YARN-4925 ContainerRequest in AMRMClient, application should be able to specify nodes/racks together with nodeLabelExpression
- Resolved