Currently the connection code for new clients marks the time when a connection was initiated (NodeConnectionState.lastConnectMs) and then uses this to compute blackout periods for nodes, during which connections will not be attempted and the node is not considered a candidate for leastLoadedNode.
However, in cases where the connection attempt takes longer than the blackout/backoff period (default 10ms), this results in incorrect behavior. If a broker is not available and, for example, the broker does not explicitly reject the connection, instead waiting for a connection timeout (e.g. due to firewall settings), then the backoff period will have already elapsed and the node will immediately be considered ready for a new connection attempt and a node to be selected by leastLoadedNode for metadata updates. I think it should be easy to reproduce and verify this problem manually by using tc to introduce enough latency to make connection failures take > 10ms.
The correct behavior would use the disconnection event to mark the end of the last connection attempt and then wait for the backoff period to elapse after that.
See http://mail-archives.apache.org/mod_mbox/kafka-users/201508.mbox/%3CCAJY8EofpeU4%2BAJ%3Dw91HDUx2RabjkWoU00Z%3DcQ2wHcQSrbPT4HA%40mail.gmail.com%3E for the original description of the problem.
This is related to KAFKA-1843 because leastLoadedNode currently will consistently choose the same node if this blackout period is not handled correctly, but is a much smaller issue.