Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.3.6, 3.4.8
-
None
-
None
-
Ubuntu 14.04.4 LTS, AWS EC2, 5 node ensembles
Description
When zxid rolls over the ensemble is unable to recover without manually restarting the cluster. The leader enters shutdown() state when zxid rolls over, but the remaining four nodes in the ensemble are not able to re-elect a new leader. This state has persisted for at least 15 minutes before an operator manually restarted the cluster and the ensemble recovered.
Config:
--------
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/raid0/zookeeper
clientPort=2181
maxClientCnxns=100
autopurge.snapRetainCount=14
autopurge.purgeInterval=24
leaderServes: True
server.7=172.26.134.88:2888:3888
server.6=172.26.136.143:2888:3888
server.5=172.26.135.103:2888:3888
server.4=172.26.134.16:2888:3888
server.9=172.26.135.19:2888:3888
Logs:
https://gist.github.com/mheffner/d615d358d4a360ae56a0d0a280040640
Attachments
Issue Links
- relates to
-
ZOOKEEPER-2789 Reassign `ZXID` for solving 32bit overflow problem
- Open