Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.23.0
-
Zookeeper version 3.4.5--1
-
8
Description
In a 5 node cluster with 3 masters and 2 slaves, and ZK on each node, when a network partition is forced, all the masters apparently lose access to their replicated log. The leading master halts. Unknown reasons, but presumably related to replicated log access. The others fail to recover from the replicated log. Unknown reasons. This could have to do with ZK setup, but it might also be a Mesos bug.
This was observed in a Chronos test drive scenario described in detail here:
https://github.com/mesos/chronos/issues/511
With setup instructions here:
https://github.com/mesos/chronos/issues/508
Attachments
Attachments
Issue Links
- is related to
-
MESOS-3532 3 Master HA setup restarts every 3 minutes
- Resolved
- relates to
-
MESOS-1399 Add retries for co-ordinator election.
- Accepted