Kafka
  1. Kafka
  2. KAFKA-1082

zkclient dies after UnknownHostException in zk reconnect

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.7.2, 0.8.0
    • Fix Version/s: None
    • Component/s: core
    • Labels:
      None

      Description

      Moving this here from the dev list:

      I've run into the following issue with the Kafka server. The zkclient lib seems to die silently if there is an UnknownHostException(or any IOException) while reconnecting the ZK session. I've filed a bug about this with the zkclient lib(https://github.com/sgroschupf/zkclient/issues/23). The ramifications for Kafka were the silent loss of all ephemeral nodes associated with the affected process.

      It is fairly easy to reproduce this locally using the following steps:
      – Configure a local kafka broker to connect to a local ZK instance using a DNS alias(e.g. add "127.0.0.1 kafka-test-dns" to your /etc/hosts)
      – Start the broker, observe that ephemeral nodes have been added to ZK
      – Suspend the broker process, preventing it from sending heartbeats to the ZK instance. Observe the loss of ephemeral nodes in ZK.
      – Remove the DNS alias(e.g. comment out the /etc/hosts line).
      – Upon resuming the broker, the UknownHostException is logged. After this point, the server cannot re-establish its ZK connection. Re-enabling the alias, for example, does not resume normal operation. The broker continues accepting requests, without participating in the ZK protocols.

      1. KAFKA-1082.patch
        15 kB
        Anatoly Fayngelerin

        Activity

        Hide
        Jun Rao added a comment -

        Thanks for the patch. Commented on the RB. Also, we need to make sure the new zkclient 0.4 jar is backward compatible, i.e., it can be a drop-in replacement of the 0.3 and 0.1 jar without causing any runtime issues. I did some verification on the broker and the consumer side. It does seem to be binary backward compatible, even tough there is a new method in the state change listener interface. It would be good if you can confirm this.

        Show
        Jun Rao added a comment - Thanks for the patch. Commented on the RB. Also, we need to make sure the new zkclient 0.4 jar is backward compatible, i.e., it can be a drop-in replacement of the 0.3 and 0.1 jar without causing any runtime issues. I did some verification on the broker and the consumer side. It does seem to be binary backward compatible, even tough there is a new method in the state change listener interface. It would be good if you can confirm this.
        Hide
        Anatoly Fayngelerin added a comment -

        Sorry, first time running the reviewboard script: https://reviews.apache.org/r/14582

        Show
        Anatoly Fayngelerin added a comment - Sorry, first time running the reviewboard script: https://reviews.apache.org/r/14582
        Hide
        Anatoly Fayngelerin added a comment -

        Created reviewboard

        Show
        Anatoly Fayngelerin added a comment - Created reviewboard

          People

          • Assignee:
            Unassigned
            Reporter:
            Anatoly Fayngelerin
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:

              Development