Kafka
  1. Kafka
  2. KAFKA-620

UnknownHostError looking for a ZK node crashes the broker

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: 0.7.1
    • Fix Version/s: None
    • Component/s: core
    • Labels:
      None
    • Environment:
      linux. Amazon's AMI

      Description

      If you totally kill a zookeeper node so that it's hostname no longer resolves to anything, the broker will die with a java.net.UnknownHostException.

      You will then be unable to restart the broker until the unknown host(s) is removed from the server.properties.

      We ran into this issue while testing our resilience to widespread AWS outages, if you can point me to the right place, I could have a go at fixing it? Unfortunately, I suspect the issue might be in the non-standard Zookeeper library that kafka uses.

      Here's the stack trace:
      org.I0Itec.zkclient.exception.ZkException: Unable to connect to [list of zookeepers]
      at org.I0Itec.zkclient.ZkConnection.connect(ZkConnection.java:66)
      at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:872)
      at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:98)
      at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:84)
      at kafka.server.KafkaZooKeeper.startup(KafkaZooKeeper.scala:44)
      at kafka.log.LogManager.<init>(LogManager.scala:87)
      at kafka.server.KafkaServer.startup(KafkaServer.scala:58)
      at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:34)
      at kafka.Kafka$.main(Kafka.scala:50)
      at kafka.Kafka.main(Kafka.scala)
      Caused by: java.net.UnknownHostException: zk-101
      at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
      at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:850)
      at java.net.InetAddress.getAddressFromNameService(InetAddress.java:1201)
      at java.net.InetAddress.getAllByName0(InetAddress.java:1154)
      at java.net.InetAddress.getAllByName(InetAddress.java:1084)
      at java.net.InetAddress.getAllByName(InetAddress.java:1020)
      at org.apache.zookeeper.ClientCnxn.<init>(ClientCnxn.java:387)
      at org.apache.zookeeper.ClientCnxn.<init>(ClientCnxn.java:332)
      at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:383)
      at org.I0Itec.zkclient.ZkConnection.connect(ZkConnection.java:64)
      ... 9 more

        Issue Links

          Activity

          Hide
          Jun Rao added a comment -

          Do you have other ZK hosts in your connection string?

          Show
          Jun Rao added a comment - Do you have other ZK hosts in your connection string?
          Hide
          Neha Narkhede added a comment -

          I would check if the zookeeper connection string has multiple zookeeper hosts specified. AFAIK, the zookeeper client (not zkclient) library has the ability to retry other zookeeper hosts if one is not reachable. I would expect, this exception should be caught within zookeeper client itself and Kafka should get an exception only when none of the zookeeper hosts are reachable. In that case, we probably don't want to start a 0.8 Kafka broker, since the 0.8 logic requires zookeeper to be available.

          Show
          Neha Narkhede added a comment - I would check if the zookeeper connection string has multiple zookeeper hosts specified. AFAIK, the zookeeper client (not zkclient) library has the ability to retry other zookeeper hosts if one is not reachable. I would expect, this exception should be caught within zookeeper client itself and Kafka should get an exception only when none of the zookeeper hosts are reachable. In that case, we probably don't want to start a 0.8 Kafka broker, since the 0.8 logic requires zookeeper to be available.
          Hide
          Guozhang Wang added a comment -

          Is this a duplicate of KAFKA-1082?

          Show
          Guozhang Wang added a comment - Is this a duplicate of KAFKA-1082 ?
          Hide
          Neha Narkhede added a comment -

          Guozhang Wang It does. Feel free to close this as a duplicate so we can track the problem as part of KAFKA-1082.

          Show
          Neha Narkhede added a comment - Guozhang Wang It does. Feel free to close this as a duplicate so we can track the problem as part of KAFKA-1082 .

            People

            • Assignee:
              Unassigned
              Reporter:
              Matthew Rathbone
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development