Details
-
Improvement
-
Status: Accepted
-
Minor
-
Resolution: Unresolved
-
None
-
None
Description
Currently, there are two timeouts that control what happens when an agent is partitioned from the master:
1. max_slave_ping_timeouts + slave_ping_timeout controls how long the master waits before declaring a slave to be dead in the "steady state"
2. slave_reregister_timeout controls how long the master waits for a slave to reregister after master failover.
It is unclear whether these two cases really merit being treated differently – it might be simpler for operators to configure a single timeout that controls how long the master waits before declaring that a slave is dead.
Attachments
Issue Links
- relates to
-
MESOS-4049 Allow user to control behavior of partitioned agents/tasks
- Resolved
-
MESOS-5396 After failover, master does not remove agents with same UPID.
- Accepted