[ZOOKEEPER-4816] A follower can not join the cluster for 20s seconds - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Critical
Resolution: Unresolved
Affects Version/s: 3.10.0
Fix Version/s: None
Component/s: None
Labels:
None

Description

We encounter a strange scenario. When we set up the cluster of zookeeper(3 nodes totally), the third node is stuck in serializing the snapshot to the local disk. However, the leader election is executed normally. After the election, the third node is elected as the leader. The other two nodes fail to connect with the leader. Hence, the first and second nodes restart the leader election, finally the second node is elected as the leader. At this time, the third node still act as the leader. There are two leaders in the cluster. The first node can not join the cluster for 20s. During this procedure, the client can not connect with any nodes of the cluster.

Runtime logs are attached.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

node1.log
14/Mar/24 13:38
105 kB
gendong1
node2.log
14/Mar/24 13:39
100 kB
gendong1
node3.log
14/Mar/24 13:39
119 kB
gendong1

Activity

People

Assignee:: Unassigned

Reporter:: gendong1

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 14/Mar/24 13:40

Updated:: 15/Mar/24 13:38