[YARN-10380] Import logic of multi-node allocation in CapacityScheduler - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 3.3.0
Fix Version/s: 3.4.0
Component/s: capacity scheduler
Labels:
- pull-request-available

Target Version/s:

3.4.0
Hadoop Flags:

Reviewed

Description

1) Entry point:
When we do multi-node allocation, we're using the same logic of async scheduling:

// Allocate containers of node [start, end)
 for (FiCaSchedulerNode node : nodes) {
  if (current++ >= start) {
     if (shouldSkipNodeSchedule(node, cs, printSkipedNodeLogging)) {
        continue;
     }
     cs.allocateContainersToNode(node.getNodeID(), false);
  }
 }

Is it the most effective way to do multi-node scheduling? Should we allocate based on partitions? In above logic, if we have thousands of node in one partition, we will repeatly access all nodes of the partition thousands of times.

I would suggest looking at making entry-point for node-heartbeat, async-scheduling (single node), and async-scheduling (multi-node) to be different.

Node-heartbeat and async-scheduling (single node) can be still similar and share most of the code.

async-scheduling (multi-node): should iterate partition first, using pseudo code like:

for (partition : all partitions) {
  allocateContainersOnMultiNodes(getCandidate(partition))
}

Attachments

Issue Links

relates to

YARN-10572 Merge YARN-8557 and YARN-10352, and rebase based YARN-10380.

Resolved

links to

GitHub Pull Request #2494

https://github.com/apache/hadoop/pull/2494

Activity

People

Assignee:: Qi Zhu

Reporter:: Wangda Tan

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 30/Jul/20 17:48

Updated:: 12/Feb/24 00:58

Resolved:: 09/Dec/20 11:59

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

2h 10m