Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
2.2.0-incubating
-
None
Description
I've noticed that it is possible for a leader election to deadlock if a thread is interrupted while it is trying to acquire the mutex for the election.
I've created a forced example of this here: https://github.com/dfjones/curator/commit/544220b1e6b51c2718a7d3511a74962ff1c5ff48
You can see deadlock by using my modified code and running the LeaderSelectorExample. Some leaders may execute, but on my system I eventually see deadlock. Note that I only see deadlock when running against a remote zk server rather than the embedded test server. I'm using Zookeeper 3.4.5 on Mac OS X 10.8.4.
From what I can tell by inspecting the ZK state/watching in the debugger, the thread that is interrupted is able to successfully create the lock object in ZK. However, due to the interrupt an exception is generated and LockInternals#internalLockLoop never runs. Later, in LeaderSelector#doWork when mutex.release() is called this fails at the for lockData.
Once this occurs, the lock object in ZK is the oldest and will cause deadlock.