[KAFKA-9815] Consumer may never re-join if inconsistent metadata is received once - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.5.0, 2.4.2
Component/s: consumer
Labels:
None

Description

~~KAFKA-9797~~ is the result of an incorrect rolling upgrade test where a new listener is added to brokers and set as the inter-broker listener within the same rolling upgrade. As a result, metadata is inconsistent across brokers until the rolling upgrade completes since interbroker communication is broken until all brokers have the new listener. The test fails due to consumer timeouts and sometimes this is because the upgrade takes longer than consumer timeout. But several logs show an issue with the consumer when one metadata response received during upgrade is different from the consumer's cached `assignmentSnapshot`, triggering rebalance.

In https://github.com/apache/kafka/blob/7f640f13b4d486477035c3edb28466734f053beb/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java#L750, we return true for `rejoinNeededOrPending()` if `assignmentSnapshot` is not the same as the current `metadataSnapshot`. We don't set `rejoinNeeded` in the instance, but we revoke partitions and send JoinGroup request. If the JoinGroup request fails and a subsequent metadata response contains the same snapshot value as the previously cached `assignmentSnapshot`, we never send `JoinGroup` again since snapshots match and `rejoinNeeded=false`. Partitions are not assigned to the consumer after this and the test fails because messages are not received.

Even though this particular system test isn't a valid upgrade scenario, we should fix the consumer, since temporary metadata differences between brokers can result in this scenario.

Attachments

Issue Links

links to

GitHub Pull Request #8420

Activity

People

Assignee:: Rajini Sivaram

Reporter:: Rajini Sivaram

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 03/Apr/20 15:18

Updated:: 07/Apr/20 00:08

Resolved:: 07/Apr/20 00:08