Details
-
Bug
-
Status: Accepted
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
If a framework becomes disconnected from the master, its tasks are killed after waiting for failover_timeout.
However, if a master failover occurs but a framework never reconnects to the new master, we never kill any of the tasks associated with that framework. These tasks remain orphaned and presumably would need to be manually removed by the operator. Similarly, if a framework gets torn down or disconnects while it has running tasks on a partitioned agent, those tasks are not shutdown when the agent reregisters.
We should consider whether to kill such orphaned tasks automatically, likely after waiting for some (framework-configurable?) timeout.
Attachments
Issue Links
- is duplicated by
-
MESOS-5378 Terminating a framework during master failover leads to orphaned tasks
- Resolved
-
MESOS-5761 Improve the logic of orphan tasks
- Resolved
- is related to
-
MESOS-1719 Master should persist framework information
- Accepted
- relates to
-
MESOS-6419 The 'master/teardown' endpoint should support tearing down 'unregistered_frameworks'.
- Resolved
-
MESOS-6602 Shutdown completed frameworks when unreachable agent re-registers
- Resolved
-
MESOS-6136 Duplicate framework id handling
- Open