Details
-
Improvement
-
Status: Accepted
-
Major
-
Resolution: Unresolved
-
0.19.0
-
None
Description
Currently starting a writer involves (1) recovering (2) co-ordinator election.
(1) uses internal retries to ensure progress is made, whereas (2) does not. This means that if the implicit promise requests are dropped, we'll end up waiting the full fetch timeout in the Registrar.
We could reduce the number of master failovers by adding an implicit retry for co-orindator election. Alternatively, doing explicit retries in the caller of Log is possible but conflates the retries for (1) and (2).
Attachments
Issue Links
- is related to
-
MESOS-3280 Master fails to access replicated log after network partition
- Resolved