Affects Version/s: 0.28.2
Observed in an OpenStack environment where each master lives on a separate VM.
Sprint:Mesosphere Sprint 38
We observed the following situation in a cluster of five masters:
|Time||Master 1||Master 2||Master 3||Master 4||Master 5|
|1||Follower||Follower||Follower||Follower||Partitioned from cluster by downing this VM's network|
|2||Elected Leader by ZK||Voting||Voting||Voting||Suicides due to lost leadership|
|3||Performs consensus||Replies to leader||Replies to leader||Replies to leader||Still down|
|4||Performs writing||Acks to leader||Acks to leader||Acks to leader||Still down|
|6||Leader||Follower||Follower||Follower||Comes back up|
|8||Partitioned in the same way as Master 5||Follower||Follower||Follower||Follower|
|9||Suicides due to lost leadership||Elected Leader by ZK||Follower||Follower||Follower|
|10||Still down||Performs consensus||Replies to leader||Replies to leader||Doesn't get the message!|
|11||Still down||Performs writing||Acks to leader||Acks to leader||Acks to leader|
Master 2 sends a series of messages to the recently-restarted Master 5. The first message is dropped, but subsequent messages are not dropped.
This appears to be due to a stale link between the masters. Before leader election, the replicated log actors create a network watcher, which adds links to masters that join the ZK group:
This link does not appear to break (Master 2 -> 5) when Master 5 goes down, perhaps due to how the network partition was induced (in the hypervisor layer, rather than in the VM itself).
When Master 2 tries to send an PromiseRequest to Master 5, we do not observe the expected log message
Instead, we see a log line in Master 2:
The broken link is removed by the libprocess socket_manager and the following WriteRequest from Master 2 to Master 5 succeeds via a new socket.