Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-7974

KafkaAdminClient loses worker thread/enters zombie state when initial DNS lookup fails

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.2.0, 2.1.1
    • Fix Version/s: 2.3.0, 2.1.2, 2.2.1
    • Component/s: admin
    • Labels:
      None

      Description

      Version: kafka-clients-2.1.0

      I have some code that creates creates a KafkaAdminClient instance and then invokes listTopics(). I was seeing the following stacktrace in the logs, after which the KafkaAdminClient instance became unresponsive:

      ERROR [kafka-admin-client-thread | adminclient-1] 2019-02-18 01:00:45,597 KafkaThread.java:51 - Uncaught exception in thread 'kafka-admin-client-thread | adminclient-1':
      java.lang.IllegalStateException: No entry found for connection 0
          at org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:330)
          at org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:134)
          at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:921)
          at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:287)
          at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.sendEligibleCalls(KafkaAdminClient.java:898)
          at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1113)
          at java.lang.Thread.run(Thread.java:748)

      From looking at the code I was able to trace down a possible cause:

      • NetworkClient.ready() invokes this.initiateConnect() as seen in the above stacktrace
      • NetworkClient.initiateConnect() invokes ClusterConnectionStates.connecting(), which internally invokes ClientUtils.resolve() to to resolve the host when creating an entry for the connection.
      • If this host lookup fails, a UnknownHostException can be thrown back to NetworkClient.initiateConnect() and the connection entry is not created in ClusterConnectionStates. This exception doesn't get logged so this is a guess on my part.
      • NetworkClient.initiateConnect() catches the exception and attempts to call ClusterConnectionStates.disconnected(), which throws an IllegalStateException because no entry had yet been created due to the lookup failure.
      • This IllegalStateException ends up killing the worker thread and KafkaAdminClient gets stuck, never returning from listTopics().

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                nickbp Nicholas Parker
              • Votes:
                1 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: