[YARN-73] nodemanager should cleanup running containers when it starts - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 0.23.3
Fix Version/s: None
Component/s: nodemanager
Labels:
None

Description

Currently the nodemanager doesn't cleanup running containers when it gets restarted. This can cause containers to get lost and stick around forever. We've seen this happen multiple times when the RM is restarted. When the RM is brought back up, it doesn't know about what was running on the cluster, it tells the NMs to reboot and when the NM reboots it loses what it had running. If there are any containers that are behaving badly there is no one left that knows about them to kill them.

We should kill any running containers when the nodemanager is being started. Note that when the NM is being brought up it needs to somehow figure out what containers were running and be sure it doesn't kill anything it shouldn't.
Note, we should also try to kill any running containers when the node manager is shutting down (jira 4213 was filed for this).

This might change a bit when RM restart is implemented if tasks can actually survive across RM/NM being rebooted, but that can be addressed at that point.

Attachments

Issue Links

duplicates

YARN-438 NM on startup should handle cleaning up of any running containers managed by the previous instance

Resolved

is duplicated by

YARN-495 Change NM behavior of reboot to resync

Closed

is related to

YARN-71 Ensure/confirm that the NodeManager cleans up local-dirs on restart

Closed

YARN-72 NM should handle cleaning up containers when it shuts down

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Thomas Graves

Votes:: 0 Vote for this issue

Watchers:: 14 Start watching this issue

Dates

Created:: 01/May/12 13:39

Updated:: 12/May/13 00:11

Resolved:: 12/May/13 00:10