Kafka guarantees at-least once delivery of messages.The high level consumer provides highly available partitioned consumption of data within the same consumer group. In the event of broker failures or consumer failures within a group, the high level consumer rebalances and redistributes the topic partitions evenly amongst the consumers in a group. With the current design, during this rebalancing operation, Kafka introduces duplicates in the consumed data.
This JIRA improves the rebalancing operation and the consumer iterator design to guarantee 0 duplicates while consuming uncompressed topics. There will be a small number of duplicates while serving compressed data, but it will be bound by the compression batch size.