Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-150

Confusing NodeExistsException failing kafka broker startup

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.7
    • 0.7
    • core
    • None

    Description

      Sometimes, broker startup fails with the following exception

      [2011-10-03 15:33:22,193] INFO Awaiting connections on port 9092
      (kafka.network.Acceptor)
      [2011-10-03 15:33:22,193] INFO Registering broker /brokers/ids/0
      (kafka.server.KafkaZooKeeper)
      [2011-10-03 15:33:22,229] INFO conflict in /brokers/ids/0 data:
      10.98.20.109-1317681202194:10.98.20.109:9092 stored data:
      10.98.20.109-1317268078266:10.98.20.109:9092 (kafka.utils.ZkUtils$)
      [2011-10-03 15:33:22,230] FATAL
      org.I0Itec.zkclient.exception.ZkNodeExistsException:
      org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode =
      NodeExists for /brokers/ids/0 (kafka.server.KafkaServer)
      [2011-10-03 15:33:22,231] FATAL
      org.I0Itec.zkclient.exception.ZkNodeExistsException:
      org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode =
      NodeExists for /brokers/ids/0
      at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:55)
      at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
      at org.I0Itec.zkclient.ZkClient.create(ZkClient.java:304)
      at org.I0Itec.zkclient.ZkClient.createEphemeral(ZkClient.java:328)
      at kafka.utils.ZkUtils$.createEphemeralPath(ZkUtils.scala:55)
      at
      kafka.utils.ZkUtils$.createEphemeralPathExpectConflict(ZkUtils.scala:71)
      at
      kafka.server.KafkaZooKeeper.registerBrokerInZk(KafkaZooKeeper.scala:54)
      at kafka.log.LogManager.startup(LogManager.scala:122)
      at kafka.server.KafkaServer.startup(KafkaServer.scala:77)
      at
      kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:40)
      at kafka.Kafka$.main(Kafka.scala:56)
      at kafka.Kafka.main(Kafka.scala)
      Caused by: org.apache.zookeeper.KeeperException$NodeExistsException:
      KeeperErrorCode = NodeExists for /brokers/ids/0
      at org.apache.zookeeper.KeeperException.create(KeeperException.java:110)
      at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
      at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
      at org.I0Itec.zkclient.ZkConnection.create(ZkConnection.java:87)
      at org.I0Itec.zkclient.ZkClient$1.call(ZkClient.java:308)
      at org.I0Itec.zkclient.ZkClient$1.call(ZkClient.java:304)
      at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
      ... 10 more
      (kafka.server.KafkaServer)
      [2011-10-03 15:33:22,231] INFO Shutting down... (kafka.server.KafkaServer)
      [2011-10-03 15:33:22,232] INFO shutdown scheduler kafka-logcleaner-
      (kafka.utils.KafkaScheduler)
      [2011-10-03 15:33:22,239] INFO shutdown scheduler kafka-logflusher-
      (kafka.utils.KafkaScheduler)
      [2011-10-03 15:33:22,481] INFO zkActor stopped (kafka.log.LogManager)
      [2011-10-03 15:33:22,482] INFO Closing zookeeper client...
      (kafka.server.KafkaZooKeeper)
      [2011-10-03 15:33:22,482] INFO Terminate ZkClient event thread.
      (org.I0Itec.zkclient.ZkEventThread)

      There could be 3 things that might have happened
      (1) you restarted kafka within the zk timeout, in which case as far as zk is concerned your old broker still exists...this is weird but actually correct behavior,
      (2) you have two brokers with the same id,
      (3) zk has a bug and is not deleting ephemeral nodes.

      Instead of just throwing the ZK NodeExistsException, we should include the above information in a well-named Kafka exception, for clarity.

      Attachments

        1. KAFKA-150.patch
          2 kB
          Jay Kreps

        Activity

          People

            jkreps Jay Kreps
            nehanarkhede Neha Narkhede
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: