[YARN-3309] Capacity scheduler can wait a very long time for node locality - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.6.0
Fix Version/s: None
Component/s: capacityscheduler
Labels:
None

Description

The capacity scheduler will delay scheduling a container on a rack-local node in hopes that a node-local opportunity will come along (~~YARN-80~~). It does this by counting the number of missed scheduling opportunities the application has had. When the count reaches a certain threshold, the app will accept the rack-local node. The documented recommendation is to set this threshold to the #nodes in the cluster.

However, there are some early-out optimizations that can lead to this delay being a very long time.
Example in allocateContainersToNode():

   // Try to schedule more if there are no reservations to fulfill
    if (node.getReservedContainer() == null) {
      if (calculator.computeAvailableContainers(node.getAvailableResource(),
        minimumAllocation) > 0) {
        if (LOG.isDebugEnabled()) {
          LOG.debug("Trying to schedule on node: " + node.getNodeName() +
              ", available: " + node.getAvailableResource());
        }
        root.assignContainers(clusterResource, node, false);
      }

So, in a large cluster that is completely full (AvailableResource on each node is 0), SchedulingOpportunities will only increase at the rate of container completion rate, not the heartbeat rate, which I think was the original assumption of ~~YARN-80~~. On a large cluster, this can lead to an hour+ of skipped scheduling opportunities meaning the fifo'ness of a queue is ignored for a very long time.

Maybe there should be a time-based limit on this delay as well as a count of missed-scheduling opportunities.

Attachments

Issue Links

is related to

SLIDER-799 AM to decide when to relax placement policy from specific host to rack/cluster

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Nathan Roberts

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 09/Mar/15 21:13

Updated:: 10/Mar/15 01:27