Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
At present, DecommissioningNodesWatcher tracks list of running applications and triggers decommission of nodes when all the applications that ran on the node completes. This Jira proposes to solve following problem:
- DecommissioningNodesWatcher skips tracking application containers on a particular node before the node is in DECOMMISSIONING state. It only tracks containers once the node is in DECOMMISSIONING state. This can lead to shuffle data loss of apps whose containers ran on this node before it was moved to decommissioning state.
- It is keeping track of running apps. We can leverage this directly from RMNode.
Attachments
Attachments
Issue Links
- is related to
-
YARN-11466 Graceful Decommission for Shuffle Services
- Open
-
YARN-11197 Backport YARN-9608 - DecommissioningNodesWatcher should get lists of running applications on node from RMNode.
- Resolved