Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-1082

zkclient dies after UnknownHostException in zk reconnect

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 0.7.2, 0.8.0
    • 0.9.0.0
    • core
    • None

    Description

      Moving this here from the dev list:

      I've run into the following issue with the Kafka server. The zkclient lib seems to die silently if there is an UnknownHostException(or any IOException) while reconnecting the ZK session. I've filed a bug about this with the zkclient lib(https://github.com/sgroschupf/zkclient/issues/23). The ramifications for Kafka were the silent loss of all ephemeral nodes associated with the affected process.

      It is fairly easy to reproduce this locally using the following steps:
      – Configure a local kafka broker to connect to a local ZK instance using a DNS alias(e.g. add "127.0.0.1 kafka-test-dns" to your /etc/hosts)
      – Start the broker, observe that ephemeral nodes have been added to ZK
      – Suspend the broker process, preventing it from sending heartbeats to the ZK instance. Observe the loss of ephemeral nodes in ZK.
      – Remove the DNS alias(e.g. comment out the /etc/hosts line).
      – Upon resuming the broker, the UknownHostException is logged. After this point, the server cannot re-establish its ZK connection. Re-enabling the alias, for example, does not resume normal operation. The broker continues accepting requests, without participating in the ZK protocols.

      Attachments

        1. KAFKA-1082.patch
          15 kB
          Anatoly Fayngelerin

        Issue Links

          Activity

            People

              gwenshap Gwen Shapira
              fanatoly Anatoly Fayngelerin
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: