[YARN-10378] When NM goes down and comes back up, PC allocation tags are not removed for completed containers - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 3.2.0, 3.1.1
Fix Version/s: None
Component/s: capacity scheduler
Labels:
None

Description

We are using placement constaints anti-affinity in an application along with node label. The application requests two containers with anti affinity on the node label containing only two nodes.

So two containers will be allocated in the two nodes, one on each node satisfying anti-affinity.

When one nodemanager goes down for some time, the node is marked as lost by RM and then it will kill all containers in that node.

The AM will now have one pending container request, since the previous container got killed.

When the Nodemanager becomes up after some time, the pending container is not getting allocated in that node again and the application has to wait forever for that container.

If the ResourceManager is restarted, this issue disappears and the container gets allocated on the NodeManager which came back up recently.

This seems to be an issue with the allocation tags not removed.

The allocation tag is added for the container container_e68_1595886973474_0005_01_000003 .

2020-07-28 17:02:04,091 DEBUG constraint.AllocationTagsManager (AllocationTagsManager.java:addContainer(355)) - Added container=container_e68_1595886973474_0005_01_000003 with tags=[hbase]\

However, the allocation tag is not removed when the container container_e68_1595886973474_0005_01_000003 is released. There is no equivalent DEBUG message seen for removing tags. This means that the tags are not getting removed. If the tag is not removed, then scheduler will not allocate in the same node due to anti-affinity resulting in the issue observed.

2020-07-28 17:19:34,353 DEBUG scheduler.AbstractYarnScheduler (AbstractYarnScheduler.java:updateCompletedContainers(1038)) - Container FINISHED: container_e68_1595886973474_0005_01_000003
2020-07-28 17:19:34,353 INFO  scheduler.AbstractYarnScheduler (AbstractYarnScheduler.java:completedContainer(669)) - Container container_e68_1595886973474_0005_01_000003 completed with event FINISHED, but corresponding RMContainer doesn't exist.

This seems to be due to changes done in ~~YARN-8511~~ . Change here was made to remove the tags only after NM confirms container is released. However, in our scenario this is not happening. So the tag will never get removed until RM restart.

Reverting ~~YARN-8511~~ fixes this particular issue and tags are getting removed. But this is not a valid solution since the problem that ~~YARN-8511~~ solves is also valid. We need to find a solution which does not break ~~YARN-8511~~ and also fixes this issue.

Attachments

Issue Links

duplicates

YARN-10034 Allocation tags are not removed when node decommission

Resolved

Activity

People

Assignee:: Tarun Parimi

Reporter:: Tarun Parimi

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 30/Jul/20 06:50

Updated:: 30/Jul/20 09:35

Resolved:: 30/Jul/20 09:35