[KAFKA-2459] Connection backoff/blackout period should start when a connection is disconnected, not when the connection attempt was initiated - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.8.2.1
Fix Version/s: 0.9.0.0
Component/s: clients, consumer, producer
Labels:
None

Description

Currently the connection code for new clients marks the time when a connection was initiated (NodeConnectionState.lastConnectMs) and then uses this to compute blackout periods for nodes, during which connections will not be attempted and the node is not considered a candidate for leastLoadedNode.

However, in cases where the connection attempt takes longer than the blackout/backoff period (default 10ms), this results in incorrect behavior. If a broker is not available and, for example, the broker does not explicitly reject the connection, instead waiting for a connection timeout (e.g. due to firewall settings), then the backoff period will have already elapsed and the node will immediately be considered ready for a new connection attempt and a node to be selected by leastLoadedNode for metadata updates. I think it should be easy to reproduce and verify this problem manually by using tc to introduce enough latency to make connection failures take > 10ms.

The correct behavior would use the disconnection event to mark the end of the last connection attempt and then wait for the backoff period to elapse after that.

See http://mail-archives.apache.org/mod_mbox/kafka-users/201508.mbox/%3CCAJY8EofpeU4%2BAJ%3Dw91HDUx2RabjkWoU00Z%3DcQ2wHcQSrbPT4HA%40mail.gmail.com%3E for the original description of the problem.

This is related to KAFKA-1843 because leastLoadedNode currently will consistently choose the same node if this blackout period is not handled correctly, but is a much smaller issue.

Attachments

Issue Links

is related to

KAFKA-1843 Metadata fetch/refresh in new producer should handle all node connection states gracefully

Open

links to

GitHub Pull Request #290

Activity

People

Assignee:: Eno Thereska

Reporter:: Ewen Cheslack-Postava

Reviewer:: Guozhang Wang

Votes:: 1 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 22/Aug/15 22:55

Updated:: 14/Mar/24 12:03

Resolved:: 21/Oct/15 17:00