Uploaded image for project: 'Apache Curator'
  1. Apache Curator
  2. CURATOR-62

Leader Election Deadlock

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 2.2.0-incubating
    • awaiting-response
    • Recipes
    • None

    Description

      I've noticed that it is possible for a leader election to deadlock if a thread is interrupted while it is trying to acquire the mutex for the election.

      I've created a forced example of this here: https://github.com/dfjones/curator/commit/544220b1e6b51c2718a7d3511a74962ff1c5ff48

      You can see deadlock by using my modified code and running the LeaderSelectorExample. Some leaders may execute, but on my system I eventually see deadlock. Note that I only see deadlock when running against a remote zk server rather than the embedded test server. I'm using Zookeeper 3.4.5 on Mac OS X 10.8.4.

      From what I can tell by inspecting the ZK state/watching in the debugger, the thread that is interrupted is able to successfully create the lock object in ZK. However, due to the interrupt an exception is generated and LockInternals#internalLockLoop never runs. Later, in LeaderSelector#doWork when mutex.release() is called this fails at the for lockData.

      Once this occurs, the lock object in ZK is the oldest and will cause deadlock.

      Attachments

        Activity

          People

            randgalt Jordan Zimmerman
            djones Duncan Jones
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: