Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
2.7.1
-
None
-
None
Description
Containers to be preempted leak in FairScheduler preemption logic. It may cause missing preemption due to containers in warnedContainers wrongly removed. The problem is in preemptResources:
There are two issues which can cause containers wrongly removed from warnedContainers:
Firstly missing the container state RMContainerState.ACQUIRED in the condition check:
(container.getState() == RMContainerState.RUNNING || container.getState() == RMContainerState.ALLOCATED)
Secondly if isResourceGreaterThanNone(toPreempt) return false, we shouldn't remove container from warnedContainers. We should only remove container from warnedContainers, if container is not in state RMContainerState.RUNNING, RMContainerState.ALLOCATED and RMContainerState.ACQUIRED.
if ((container.getState() == RMContainerState.RUNNING || container.getState() == RMContainerState.ALLOCATED) && isResourceGreaterThanNone(toPreempt)) { warnOrKillContainer(container); Resources.subtractFrom(toPreempt, container.getContainer().getResource()); } else { warnedIter.remove(); }
Also once the containers in warnedContainers are wrongly removed, it will never be preempted. Because these containers are already in FSAppAttempt#preemptionMap and FSAppAttempt#preemptContainer won't return the containers in FSAppAttempt#preemptionMap.
public RMContainer preemptContainer() { if (LOG.isDebugEnabled()) { LOG.debug("App " + getName() + " is going to preempt a running " + "container"); } RMContainer toBePreempted = null; for (RMContainer container : getLiveContainers()) { if (!getPreemptionContainers().contains(container) && (toBePreempted == null || comparator.compare(toBePreempted, container) > 0)) { toBePreempted = container; } } return toBePreempted; }
Attachments
Attachments
Issue Links
- is part of
-
YARN-4752 FairScheduler should preempt for a ResourceRequest and all preempted containers should be on the same node
- Resolved