I have some code that creates creates a KafkaAdminClient instance and then invokes listTopics(). I was seeing the following stacktrace in the logs, after which the KafkaAdminClient instance became unresponsive:
From looking at the code I was able to trace down a possible cause:
- NetworkClient.ready() invokes this.initiateConnect() as seen in the above stacktrace
- NetworkClient.initiateConnect() invokes ClusterConnectionStates.connecting(), which internally invokes ClientUtils.resolve() to to resolve the host when creating an entry for the connection.
- If this host lookup fails, a UnknownHostException can be thrown back to NetworkClient.initiateConnect() and the connection entry is not created in ClusterConnectionStates. This exception doesn't get logged so this is a guess on my part.
- NetworkClient.initiateConnect() catches the exception and attempts to call ClusterConnectionStates.disconnected(), which throws an IllegalStateException because no entry had yet been created due to the lookup failure.
- This IllegalStateException ends up killing the worker thread and KafkaAdminClient gets stuck, never returning from listTopics().