Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
3.2.0, 3.1.1
-
None
-
None
Description
We are using placement constaints anti-affinity in an application along with node label. The application requests two containers with anti affinity on the node label containing only two nodes.
So two containers will be allocated in the two nodes, one on each node satisfying anti-affinity.
When one nodemanager goes down for some time, the node is marked as lost by RM and then it will kill all containers in that node.
The AM will now have one pending container request, since the previous container got killed.
When the Nodemanager becomes up after some time, the pending container is not getting allocated in that node again and the application has to wait forever for that container.
If the ResourceManager is restarted, this issue disappears and the container gets allocated on the NodeManager which came back up recently.
This seems to be an issue with the allocation tags not removed.
The allocation tag is added for the container container_e68_1595886973474_0005_01_000003 .
2020-07-28 17:02:04,091 DEBUG constraint.AllocationTagsManager (AllocationTagsManager.java:addContainer(355)) - Added container=container_e68_1595886973474_0005_01_000003 with tags=[hbase]\
However, the allocation tag is not removed when the container container_e68_1595886973474_0005_01_000003 is released. There is no equivalent DEBUG message seen for removing tags. This means that the tags are not getting removed. If the tag is not removed, then scheduler will not allocate in the same node due to anti-affinity resulting in the issue observed.
2020-07-28 17:19:34,353 DEBUG scheduler.AbstractYarnScheduler (AbstractYarnScheduler.java:updateCompletedContainers(1038)) - Container FINISHED: container_e68_1595886973474_0005_01_000003 2020-07-28 17:19:34,353 INFO scheduler.AbstractYarnScheduler (AbstractYarnScheduler.java:completedContainer(669)) - Container container_e68_1595886973474_0005_01_000003 completed with event FINISHED, but corresponding RMContainer doesn't exist.
This seems to be due to changes done in YARN-8511 . Change here was made to remove the tags only after NM confirms container is released. However, in our scenario this is not happening. So the tag will never get removed until RM restart.
Reverting YARN-8511 fixes this particular issue and tags are getting removed. But this is not a valid solution since the problem that YARN-8511 solves is also valid. We need to find a solution which does not break YARN-8511 and also fixes this issue.
Attachments
Issue Links
- duplicates
-
YARN-10034 Allocation tags are not removed when node decommission
- Resolved