[MESOS-1399] Add retries for co-ordinator election. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Accepted
Priority: Major
Resolution: Unresolved
Affects Version/s: 0.19.0
Fix Version/s: None
Component/s: master, replicated log
Labels:
- mesosphere

Description

Currently starting a writer involves (1) recovering (2) co-ordinator election.

(1) uses internal retries to ensure progress is made, whereas (2) does not. This means that if the implicit promise requests are dropped, we'll end up waiting the full fetch timeout in the Registrar.

We could reduce the number of master failovers by adding an implicit retry for co-orindator election. Alternatively, doing explicit retries in the caller of Log is possible but conflates the retries for (1) and (2).

Attachments

Issue Links

is related to

MESOS-3280 Master fails to access replicated log after network partition

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Benjamin Mahler

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 20/May/14 22:30

Updated:: 02/Oct/15 15:20