Description
I would like to expose you to a problem that we are experiencing.
We are using a cluster of 7 zookeeper and we use them to implement a distributed lock using Curator (http://curator.apache.org/curator-recipes/shared-reentrant-lock.html)
So .. we tried to play with the servers to see if everything worked properly and we stopped and start servers to see that the system worked well
(like stop 03, stop 05, stop 06, start 05, start 06, start 03)
We saw a strange behavior.
The number of znodes grew up without stopping (normally we had 4000 or 5000, we got to 60,000 and then we stopped our application)
In zookeeeper logs I saw this (on leader only, one every minute)
2016-07-04 14:53:50,302 [myid:7] - ERROR [ContainerManagerTask:ContainerManager$1@84] - Error checking containers
java.lang.NullPointerException
at org.apache.zookeeper.server.ContainerManager.getCandidates(ContainerManager.java:151)
at org.apache.zookeeper.server.ContainerManager.checkContainers(ContainerManager.java:111)
at org.apache.zookeeper.server.ContainerManager$1.run(ContainerManager.java:78)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)
We have not yet deleted the data ... so the problem can be reproduced on our servers
Attachments
Attachments
Issue Links
- Is contained by
-
ZOOKEEPER-2680 Correct DataNode.getChildren() inconsistent behaviour.
- Closed
- is duplicated by
-
ZOOKEEPER-2705 Container node remains indefinitely after session has long expired!
- Resolved