Uploaded image for project: 'Apache Curator'
  1. Apache Curator
  2. CURATOR-264

Leader election: Duplicate ephemeral nodes with same owner id

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.8.0
    • 2.9.1
    • Framework, Recipes
    • None

    Description

      We sometimes experience failure in our leader-election functionality when we have network issues. When this situation occurs we see that there are two ephemeral nodes in the zookeeper cluster for the same session but there is no active leader.

      I have managed to recreate the same scenario by running a test locally and use iptables to simulate network issues. The debug log (see attachment) shows that findAndDeleteProtectedNodeInBackground does not delete the node because processResult in FindProtectedNodeCB receives a -101 (NoNode) resultcode. I suspect this can happen if the read is not synched? (http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkGuarantees)

      This also seems to be related to:
      https://issues.apache.org/jira/browse/CURATOR-45 and
      https://issues.apache.org/jira/browse/CURATOR-79

      Attachments

        1. zkTransactionLog.txt
          1 kB
          Ole Hjalmar Herje
        2. zkNodes.txt
          1 kB
          Ole Hjalmar Herje
        3. testLog.txt
          37 kB
          Ole Hjalmar Herje

        Activity

          People

            randgalt Jordan Zimmerman
            ollis Ole Hjalmar Herje
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: