Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-2459

Connection backoff/blackout period should start when a connection is disconnected, not when the connection attempt was initiated



    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • clients, consumer, producer
    • None


      Currently the connection code for new clients marks the time when a connection was initiated (NodeConnectionState.lastConnectMs) and then uses this to compute blackout periods for nodes, during which connections will not be attempted and the node is not considered a candidate for leastLoadedNode.

      However, in cases where the connection attempt takes longer than the blackout/backoff period (default 10ms), this results in incorrect behavior. If a broker is not available and, for example, the broker does not explicitly reject the connection, instead waiting for a connection timeout (e.g. due to firewall settings), then the backoff period will have already elapsed and the node will immediately be considered ready for a new connection attempt and a node to be selected by leastLoadedNode for metadata updates. I think it should be easy to reproduce and verify this problem manually by using tc to introduce enough latency to make connection failures take > 10ms.

      The correct behavior would use the disconnection event to mark the end of the last connection attempt and then wait for the backoff period to elapse after that.

      See http://mail-archives.apache.org/mod_mbox/kafka-users/201508.mbox/%3CCAJY8EofpeU4%2BAJ%3Dw91HDUx2RabjkWoU00Z%3DcQ2wHcQSrbPT4HA%40mail.gmail.com%3E for the original description of the problem.

      This is related to KAFKA-1843 because leastLoadedNode currently will consistently choose the same node if this blackout period is not handled correctly, but is a much smaller issue.


        Issue Links



              enothereska Eno Thereska
              ewencp Ewen Cheslack-Postava
              Guozhang Wang Guozhang Wang
              1 Vote for this issue
              9 Start watching this issue