Description
When a consumer is created for a new group, the group metadata's protocol type is set to 'consumer' and this is stored both in __consumer_offsets as well as in the coordinator's local cache.
If the consumer leaves the group and the group becomes empty, ListGroups requests will continue to show the group as type 'consumer', and as such kafka-consumer-groups.sh will show it via --list.
However, if the coordinator (broker) node is killed and a new coordinator is elected, when the GroupMetadataManager loads the group from __consumer_offsets into its cache, it will not set the protocolType if there are no active consumers. As a result, the group's protocolType will now become the empty string (UNKNOWN_PROTOCOL_TYPE), and kafka-consumer-groups.sh will no longer show the group.
Ideally bouncing a broker should not result in the group's protocol changing. protocolType can be set in GroupMetadataManager.readGroupMessageValue() to always reflect what's present in the persistent metadata (__consumer_offsets) regardless if there are active members.
In general, things can get confusing when distinguishing between 'consumer' and non-consumer groups. For example, DescribeGroups and OffsetFetchRequest does not filter on protocol type, so it's possible for kafka-consumer-groups.sh to describe groups (--describe) without actually being able to list them. In the protocol guide, OffsetFetchRequest / OffsetCommitRequest have their parameters listed as 'ConsumerGroup', but in reality these can be used for groups of unknown type as well. For instance, a consumer group can be copied by finding a coordinator (GroupCoordinatorRequest / FindCoordinatorRequest) and sending an OffsetCommitRequest. The group will be auto-created with an empty protocol. Although this is arguably correct, the group will now exist but not be a proper 'consumer' group until later when there is a JoinGroupRequest. Again, this can be confusing as far as categorization / visibility of the group is concerned. A group can instead be copied more directly by creating a consumer and calling commitSync (as kafka-consumer-groups.sh), but this does involve extra connections / requests and so is a little slower when trying to keep a large number of groups in sync in real-time across clusters.
If we want to make it easier to keep consumer groups consistent, options include allowing groups to be explicitly created with a protocol type instead of only lazily, or for groups created outside of JoinGroupRequest the coordinator can gain a broker config to set a default protocol type for groups. This would default to 'consumer'. This sort of setting might be confusing for users though, since implicitly created groups is certainly not the norm.
Attachments
Issue Links
- incorporates
-
KAFKA-6421 consumer group can't show when use kafka-consumer-groups tool if the leader of __consumer_offsets partition for this group changed
- Resolved
-
KAFKA-6336 when using assign() with kafka consumer the KafkaConsumerGroup command doesnt show those consumers
- Resolved
- relates to
-
KAFKA-6434 Kafka-consumer-groups.sh reset-offsets does not work properly for not existing group
- Open
- links to