[MESOS-295] Allow new masters to have better understanding of cluster state - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Critical
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: 0.19.0
Component/s: None
Labels:
- twitter

Description

If a new master becomes elected, it will only have knowledge of the current state of the cluster. This can lead to a situation where tasks become lost but aren't properly killed. For instance:

1) A set of machines (perhaps a datacenter rack) lose network connectivity and their tasks are marked LOST by the master. However, they're still running.
2) Through a potentially unrelated situation, there is a master failover to a new master
3) The network connection to the machines come back up
4) These slaves never killed their tasks (and they shouldn't if they can't talk to a master)
5) Tasks stay running and aren't killed, taking up resources and running outside the scope of the new master

Attachments

Issue Links

blocks

MESOS-338 Mesos 1.0

Resolved

is blocked by

MESOS-764 Implement Master persistence using the Registrar.

Resolved

Activity

People

Assignee:: Benjamin Mahler

Reporter:: Joe Smith

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 23/Oct/12 23:50

Updated:: 02/Jul/15 18:43

Resolved:: 28/Apr/14 17:42