We have a ZooKeeper 3.7.0 ensemble with servers 1-3. Using zkCli.sh we saw the following znode on server 1:
This znode was absent on server 2 and 3. A delete on the node failed everywhere.
This makes sense as a mutable operation goes to the leader, which was server 3 and where the node is absent. Restarting server 1 did not fix the issue. Stopping server 1, removing the zookeeper database and version-2 directory, and starting the server fixed the issue.
Session 0x20006d5d6a10000 was created by a ZooKeeper client and initially connected to server 2. The client later closed the session at around the same time as the then-leader server 2 was stopped:
That both were closed/shutdown at the same time is something we do on our side. Perhaps there is a race in the handling of closing of sessions and the shutdown of the leader?
We did a dump of server 1-3, and server 1 had two sessions that did not exist in server 2 and 3, and there was one ephemeral node reported on 1 that was not reported on server 2 and 3. The ephemeral owner matched one of the two extraneous sessions. The dump from server 1 with the extra entries: