Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-7974

KafkaAdminClient loses worker thread/enters zombie state when initial DNS lookup fails

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2.0, 2.1.1
    • 2.3.0, 2.1.2, 2.2.1
    • admin
    • None

    Description

      Version: kafka-clients-2.1.0

      I have some code that creates creates a KafkaAdminClient instance and then invokes listTopics(). I was seeing the following stacktrace in the logs, after which the KafkaAdminClient instance became unresponsive:

      ERROR [kafka-admin-client-thread | adminclient-1] 2019-02-18 01:00:45,597 KafkaThread.java:51 - Uncaught exception in thread 'kafka-admin-client-thread | adminclient-1':
      java.lang.IllegalStateException: No entry found for connection 0
          at org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:330)
          at org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:134)
          at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:921)
          at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:287)
          at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.sendEligibleCalls(KafkaAdminClient.java:898)
          at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1113)
          at java.lang.Thread.run(Thread.java:748)

      From looking at the code I was able to trace down a possible cause:

      • NetworkClient.ready() invokes this.initiateConnect() as seen in the above stacktrace
      • NetworkClient.initiateConnect() invokes ClusterConnectionStates.connecting(), which internally invokes ClientUtils.resolve() to to resolve the host when creating an entry for the connection.
      • If this host lookup fails, a UnknownHostException can be thrown back to NetworkClient.initiateConnect() and the connection entry is not created in ClusterConnectionStates. This exception doesn't get logged so this is a guess on my part.
      • NetworkClient.initiateConnect() catches the exception and attempts to call ClusterConnectionStates.disconnected(), which throws an IllegalStateException because no entry had yet been created due to the lookup failure.
      • This IllegalStateException ends up killing the worker thread and KafkaAdminClient gets stuck, never returning from listTopics().

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              nickbp Nicholas Parker
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: