[ZOOKEEPER-975] new peer goes in LEADING state even if ensemble is online - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 3.3.2
Fix Version/s: 3.4.0
Component/s: None
Labels:
None

Description

Scenario:
1. 2 of the 3 ZK nodes are online
2. Third node is attempting to join
3. Third node unnecessarily goes in "LEADING" state
4. Then third goes back to LOOKING (no majority of followers) and finally goes to FOLLOWING state.

While going through the logs I noticed that a peer C that is trying to
join an already formed cluster goes in LEADING state. This is because
QuorumCnxManager of A and B sends the entire history of notification
messages to C. C receives the notification messages that were
exchanged between A and B when they were forming the cluster.

In FastLeaderElection.lookForLeader(), due to the following piece of
code, C quits lookForLeader assuming that it is supposed to lead.

740 //If have received from all nodes, then terminate
741 if ((self.getVotingView().size() == recvset.size()) &&
742 (self.getQuorumVerifier().getWeight(proposedLeader) != 0))

{ 743 self.setPeerState((proposedLeader == self.getId()) ? 744 ServerState.LEADING: learningState()); 745 leaveInstance(); 746 return new Vote(proposedLeader, proposedZxid); 747 748 }

else if (termPredicate(recvset,

This can cause:
1. C to unnecessarily go in LEADING state and wait for tickTime * initLimit and then restart the FLE.

2. C waits for 200 ms (finalizeWait) and then considers whatever
notifications it has received to make a decision. C could potentially
decide to follow an old leader, fail to connect to the leader, and
then restart FLE. See code below.

752 if (termPredicate(recvset,
753 new Vote(proposedLeader, proposedZxid,
754 logicalclock))) {
755
756 // Verify if there is any change in the proposed leader
757 while((n = recvqueue.poll(finalizeWait,
758 TimeUnit.MILLISECONDS)) != null){
759 if(totalOrderPredicate(n.leader, n.zxid,
760 proposedLeader, proposedZxid))

{ 761 recvqueue.put(n); 762 break; 763 }

764 }

In general, this does not affect correctness of FLE since C will
eventually go back to FOLLOWING state (A and B won't vote for
C). However, this delays C from joining the cluster. This can in turn
affect recovery time of an application.

Proposal: A and B should send only the latest notification (most
recent) instead of the entire history. Does this sound reasonable?

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

ZOOKEEPER-975.patch
23/Mar/11 23:26
24 kB
Vishal Kher
ZOOKEEPER-975.patch
25/Jan/11 07:31
6 kB
Vishal Kher
ZOOKEEPER-975.patch2
24/Mar/11 22:32
24 kB
Vishal Kher
ZOOKEEPER-975.patch3
29/Mar/11 17:50
29 kB
Vishal Kher
ZOOKEEPER-975.patch4
12/Apr/11 23:44
32 kB
Vishal Kher
ZOOKEEPER-975.patch5
13/Apr/11 16:23
32 kB
Vishal Kher
ZOOKEEPER-975.patch6
25/Apr/11 18:30
32 kB
Vishal Kher

Activity

People

Assignee:: Vishal Kher

Reporter:: Vishal Kher

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 14/Jan/11 12:29

Updated:: 23/Nov/11 19:22

Resolved:: 29/Apr/11 16:13