Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-6584

Session expiration concurrent with ZooKeeper leadership failover may lead to broker registration failure

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.0.0
    • None
    • zkclient
    • None

    Description

      It seems that an edge case exists which can lead to sessions "un-expiring" during a ZooKeeper leadership failover. Additional details can be found in ZOOKEEPER-2985.

      This leads to a NODEXISTS error when attempting to re-create the ephemeral brokers/ids/{id} node in ZkUtils.registerBrokerInZk. We experienced this issue on each node within a 3-node Kafka cluster running 1.0.0. All three nodes continued running (producers and consumers appeared unaffected), but none of the nodes were considered online and partition leadership could be not re-assigned.

      I took a quick look at trunk and I believe the issue is still present, but has moved into KafkaZkClient.checkedEphemeralCreate which will raise an error when it finds that the broker/ids/{id} node exists, but belongs to the old (believed expired) session.

       

      NOTE: KAFKA-7165 introduce a workaround to cope with the case described here. We decided to keep this issue open to track the ZOOKEEPER-2985 status.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              cthunes Chris Thunes
              Votes:
              1 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated: