Details
-
Bug
-
Status: Reviewable
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Currently a LostSlaveMessage (in v1 it's a type of Event::Failure) is broadcasted to all registered frameworks in the cluster whenever a slave is lost.
This is unnecessary and kind of breaks the Mesos abstraction: Frameworks are a given a slice of the cluster, not the entirety. They know about the slice when offers are extended to them, so we shouldn't inform all of them when all agents go away.
This message should instead be narrowcasted to all frameworks who have a stake in this agent: running tasks, pending offers, reservations, persistent volumes, etc.