[MESOS-4048] Consider unifying slave timeout behavior between steady state and master failover - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Accepted
Priority: Minor
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: agent, master
Labels:
- mesosphere

Description

Currently, there are two timeouts that control what happens when an agent is partitioned from the master:

1. max_slave_ping_timeouts + slave_ping_timeout controls how long the master waits before declaring a slave to be dead in the "steady state"
2. slave_reregister_timeout controls how long the master waits for a slave to reregister after master failover.

It is unclear whether these two cases really merit being treated differently – it might be simpler for operators to configure a single timeout that controls how long the master waits before declaring that a slave is dead.

Attachments

Issue Links

relates to

MESOS-4049 Allow user to control behavior of partitioned agents/tasks

Resolved

MESOS-5396 After failover, master does not remove agents with same UPID.

Accepted

Activity

People

Assignee:: Megha Sharma

Reporter:: Neil Conway

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 02/Dec/15 19:54

Updated:: 26/Nov/18 13:36