Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
None
-
None
-
None
Description
We were looking into test failures here: https://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/system-test-kafka-branch-builder--1702475525--jolshan--kafka-15784--7cad567675/2023-12-13--001./2023-12-13–001./report.html.
Here is the first failure in the report:
====================================================================================================
test_id: kafkatest.tests.core.group_mode_transactions_test.GroupModeTransactionsTest.test_transactions.failure_mode=clean_bounce.bounce_target=brokers
status: FAIL
run time: 3 minutes 4.950 seconds
TimeoutError('Consumer consumed only 88223 out of 100000 messages in 90s')
We traced the failure to an apparent bug during the last rebalance before the group became empty. The last remaining instance seems to receive an incomplete assignment which prevents it from completing expected consumption on some partitions. Here is the rebalance from the coordinator's perspective:
server.log.2023-12-13-04:[2023-12-13 04:58:56,987] INFO [GroupCoordinator 3]: Stabilized group grouped-transactions-test-consumer-group generation 5 (__consumer_offsets-2) with 1 members (kafka.coordinator.group.GroupCoordinator) server.log.2023-12-13-04:[2023-12-13 04:58:56,990] INFO [GroupCoordinator 3]: Assignment received from leader consumer-grouped-transactions-test-consumer-group-1-2164f472-93f3-4176-af3f-23d4ed8b37fd for group grouped-transactions-test-consumer-group for generation 5. The group has 1 members, 0 of which are static. (kafka.coordinator.group.GroupCoordinator)
The group is down to one member in generation 5. In the previous generation, the consumer in question reported this assignment:
// Gen 4: we've got partitions 0-4 [2023-12-13 04:58:52,631] DEBUG [Consumer clientId=consumer-grouped-transactions-test-consumer-group-1, groupId=grouped-transactions-test-consumer-group] Executing onJoinComplete with generation 4 and memberId consumer-grouped-transactions-test-consumer-group-1-2164f472-93f3-4176-af3f-23d4ed8b37fd (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [2023-12-13 04:58:52,631] INFO [Consumer clientId=consumer-grouped-transactions-test-consumer-group-1, groupId=grouped-transactions-test-consumer-group] Notifying assignor about the new Assignment(partitions=[input-topic-0, input-topic-1, input-topic-2, input-topic-3, input-topic-4]) (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
However, in generation 5, we seem to be assigned only one partition:
// Gen 5: Now we have only partition 1? But aren't we the last member in the group? [2023-12-13 04:58:56,954] DEBUG [Consumer clientId=consumer-grouped-transactions-test-consumer-group-1, groupId=grouped-transactions-test-consumer-group] Executing onJoinComplete with generation 5 and memberId consumer-grouped-transactions-test-consumer-group-1-2164f472-93f3-4176-af3f-23d4ed8b37fd (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [2023-12-13 04:58:56,955] INFO [Consumer clientId=consumer-grouped-transactions-test-consumer-group-1, groupId=grouped-transactions-test-consumer-group] Notifying assignor about the new Assignment(partitions=[input-topic-1]) (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
The assignment type is range from the JoinGroup for generation 5. The decoded metadata sent by the consumer is this:
Subscription(topics=[input-topic], ownedPartitions=[], groupInstanceId=null, generationId=4, rackId=null)
Here is the decoded assignment from the SyncGroup:
Assignment(partitions=[input-topic-1])
Attachments
Issue Links
- links to