[ZOOKEEPER-2791] Quorum doesn't recover after zxid rollover - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.3.6, 3.4.8
Fix Version/s: None
Component/s: leaderElection, quorum
Labels:
None
Environment:

Ubuntu 14.04.4 LTS, AWS EC2, 5 node ensembles

Description

When zxid rolls over the ensemble is unable to recover without manually restarting the cluster. The leader enters shutdown() state when zxid rolls over, but the remaining four nodes in the ensemble are not able to re-elect a new leader. This state has persisted for at least 15 minutes before an operator manually restarted the cluster and the ensemble recovered.

Config:
--------
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/raid0/zookeeper
clientPort=2181
maxClientCnxns=100
autopurge.snapRetainCount=14
autopurge.purgeInterval=24
leaderServes: True
server.7=172.26.134.88:2888:3888
server.6=172.26.136.143:2888:3888
server.5=172.26.135.103:2888:3888
server.4=172.26.134.16:2888:3888
server.9=172.26.135.19:2888:3888

Logs:

https://gist.github.com/mheffner/d615d358d4a360ae56a0d0a280040640

Attachments

Issue Links

relates to

ZOOKEEPER-2789 Reassign `ZXID` for solving 32bit overflow problem

Open

Activity

People

Assignee:: Abraham Fine

Reporter:: Mike Heffner

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 24/May/17 15:03

Updated:: 26/May/17 18:41