Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
1.0.0
-
None
-
None
Description
It seems that an edge case exists which can lead to sessions "un-expiring" during a ZooKeeper leadership failover. Additional details can be found in ZOOKEEPER-2985.
This leads to a NODEXISTS error when attempting to re-create the ephemeral brokers/ids/{id} node in ZkUtils.registerBrokerInZk. We experienced this issue on each node within a 3-node Kafka cluster running 1.0.0. All three nodes continued running (producers and consumers appeared unaffected), but none of the nodes were considered online and partition leadership could be not re-assigned.
I took a quick look at trunk and I believe the issue is still present, but has moved into KafkaZkClient.checkedEphemeralCreate which will raise an error when it finds that the broker/ids/{id} node exists, but belongs to the old (believed expired) session.
NOTE: KAFKA-7165 introduce a workaround to cope with the case described here. We decided to keep this issue open to track the ZOOKEEPER-2985 status.
Attachments
Issue Links
- is related to
-
ZOOKEEPER-2985 Expired session may unexpired after leader failover
- Open
- relates to
-
KAFKA-4277 creating ephemeral node already exist
- Resolved
-
KAFKA-5971 Broker keeps running even though not registered in ZK
- Resolved
-
KAFKA-7165 Error while creating ephemeral at /brokers/ids/BROKER_ID
- Resolved