[CURATOR-264] Leader election: Duplicate ephemeral nodes with same owner id - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.8.0
Fix Version/s: 2.9.1
Component/s: Framework, Recipes
Labels:
None

Description

We sometimes experience failure in our leader-election functionality when we have network issues. When this situation occurs we see that there are two ephemeral nodes in the zookeeper cluster for the same session but there is no active leader.

I have managed to recreate the same scenario by running a test locally and use iptables to simulate network issues. The debug log (see attachment) shows that findAndDeleteProtectedNodeInBackground does not delete the node because processResult in FindProtectedNodeCB receives a -101 (NoNode) resultcode. I suspect this can happen if the read is not synched? (http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkGuarantees)

This also seems to be related to:
https://issues.apache.org/jira/browse/CURATOR-45 and
https://issues.apache.org/jira/browse/CURATOR-79

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

zkTransactionLog.txt
22/Sep/15 07:00
1 kB
Ole Hjalmar Herje
zkNodes.txt
22/Sep/15 06:59
1 kB
Ole Hjalmar Herje
testLog.txt
22/Sep/15 07:03
37 kB
Ole Hjalmar Herje

Activity

People

Assignee:: Jordan Zimmerman

Reporter:: Ole Hjalmar Herje

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 22/Sep/15 06:57

Updated:: 23/Sep/15 13:22

Resolved:: 23/Sep/15 13:22