Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Not A Problem
-
3.4.6
-
None
-
None
-
Ubuntu 12.04, OpenJDK 1.6
Description
In 3-node cluster, when there are 2 nodes die and reboot during leader election, it might lead to the case that there are 2 leaders happen in the system. Eventually, a leader that does not has follower supports and quit being leader, but it makes us lose some availability.
I am building a tools that can reorder messages and disk write, and also inject node crash to the system and found this bug.
These are the step of events that my tools execute in sequence that lead to 2 leaders at the end.
My zookeeper nodes have id = 0,1,2
packetsend from=0 to=1 state=0 leader=0 zxid=0 electionEpoch=1 peerEpoch=0
packetsend from=0 to=2 state=0 leader=0 zxid=0 electionEpoch=1 peerEpoch=0
packetsend from=2 to=0 state=0 leader=2 zxid=0 electionEpoch=1 peerEpoch=0
packetsend from=2 to=1 state=0 leader=2 zxid=0 electionEpoch=1 peerEpoch=0
packetsend from=1 to=0 state=0 leader=1 zxid=0 electionEpoch=1 peerEpoch=0
packetsend from=1 to=2 state=0 leader=1 zxid=0 electionEpoch=1 peerEpoch=0
packetsend from=1 to=0 state=0 leader=2 zxid=0 electionEpoch=1 peerEpoch=0
packetsend from=0 to=1 state=0 leader=2 zxid=0 electionEpoch=1 peerEpoch=0
packetsend from=1 to=2 state=0 leader=2 zxid=0 electionEpoch=1 peerEpoch=0
packetsend from=0 to=2 state=0 leader=2 zxid=0 electionEpoch=1 peerEpoch=0
diskwrite nodeId=0 write=currentEpoch
nodecrash id=0
nodecrash id=1
nodestart id=0
nodestart id=1
diskwrite nodeId=2 write=currentEpoch
packetsend from=2 to=0 state=0 leader=2 zxid=0 electionEpoch=1 peerEpoch=0
packetsend from=0 to=2 state=0 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
packetsend from=2 to=1 state=0 leader=2 zxid=0 electionEpoch=1 peerEpoch=0
packetsend from=0 to=1 state=0 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
packetsend from=1 to=0 state=0 leader=1 zxid=0 electionEpoch=1 peerEpoch=0
packetsend from=1 to=2 state=0 leader=1 zxid=0 electionEpoch=1 peerEpoch=0
packetsend from=2 to=0 state=2 leader=2 zxid=0 electionEpoch=1 peerEpoch=1
packetsend from=1 to=0 state=0 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
packetsend from=2 to=1 state=2 leader=2 zxid=0 electionEpoch=1 peerEpoch=1
packetsend from=1 to=2 state=0 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
packetsend from=2 to=1 state=2 leader=2 zxid=0 electionEpoch=1 peerEpoch=1
packetsend from=1 to=0 state=0 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
packetsend from=1 to=2 state=0 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
packetsend from=0 to=1 state=2 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
packetsend from=2 to=1 state=2 leader=2 zxid=0 electionEpoch=1 peerEpoch=1
packetsend from=1 to=0 state=0 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
packetsend from=1 to=2 state=0 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
packetsend from=0 to=1 state=2 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
packetsend from=2 to=1 state=2 leader=2 zxid=0 electionEpoch=1 peerEpoch=1
packetsend from=1 to=0 state=0 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
packetsend from=1 to=2 state=0 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
packetsend from=0 to=1 state=2 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
packetsend from=2 to=0 state=0 leader=2 zxid=0 electionEpoch=2 peerEpoch=1
packetsend from=2 to=1 state=0 leader=2 zxid=0 electionEpoch=2 peerEpoch=1
packetsend from=0 to=2 state=2 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
packetsend from=2 to=0 state=0 leader=2 zxid=0 electionEpoch=2 peerEpoch=1
packetsend from=1 to=0 state=0 leader=2 zxid=0 electionEpoch=2 peerEpoch=1
packetsend from=1 to=2 state=0 leader=2 zxid=0 electionEpoch=2 peerEpoch=1
packetsend from=2 to=1 state=0 leader=2 zxid=0 electionEpoch=2 peerEpoch=1
packetsend from=0 to=2 state=2 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
packetsend from=2 to=0 state=0 leader=2 zxid=0 electionEpoch=2 peerEpoch=1
packetsend from=0 to=1 state=2 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
packetsend from=2 to=1 state=0 leader=2 zxid=0 electionEpoch=2 peerEpoch=1
packetsend from=0 to=2 state=2 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
diskwrite nodeId=2 write=currentEpoch
diskwrite nodeId=1 write=currentEpoch