Details
-
Bug
-
Status: Patch Available
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
This issue is to solve problems about reservation when multi-node enabled:
- As discussed in YARN-9576, re-reservation proposal may be always generated on the same node and break the scheduling for this app and later apps. I think re-reservation in unnecessary and we can replace it with LOCALITY_SKIPPED to let scheduler have a chance to look up follow candidates for this app when multi-node enabled.
- Scheduler iterates all nodes and try to allocate for reserved container in LeafQueue#allocateFromReservedContainer. Here there are two problems:
- The node of reserved container should be taken as candidates instead of all nodes when calling FiCaSchedulerApp#assignContainers, otherwise later scheduler may generate a reservation-fulfilled proposal on another node, which will always be rejected in FiCaScheduler#commonCheckContainerAllocation.
- Assignment returned by FiCaSchedulerApp#assignContainers could never be null even if it's just skipped, it will break the normal scheduling process for this leaf queue because of the if clause in LeafQueue#assignContainers: "if (null != assignment) { return assignment;}"
- Nodes which have been reserved should be skipped when iterating candidates in RegularContainerAllocator#allocate, otherwise scheduler may generate allocation or reservation proposal on these node which will always be rejected in FiCaScheduler#commonCheckContainerAllocation.
Attachments
Attachments
Issue Links
- is related to
-
YARN-10259 Reserved Containers not allocated from available space of other nodes in CandidateNodeSet in MultiNodePlacement
- Resolved
- relates to
-
YARN-11573 Add config option to make container allocation prefer nodes without reserved containers
- Resolved