[FLINK-33153] Kafka using latest-offset maybe missing data - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Minor
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: None
Component/s: Connectors / Kafka
Labels:
None

Description

When Kafka start with the latest-offset strategy, it does not fetch the latest snapshot offset and specify it for consumption. Instead, it sets the startingOffset to -1 (KafkaPartitionSplit.LATEST_OFFSET, which makes currentOffset = -1, and call the KafkaConsumer's seekToEnd API). The currentOffset is only set to the consumed offset + 1 when the task consumes data, and this currentOffset is stored in the state during checkpointing. If there are very few messages in Kafka and a partition has not consumed any data, and I stop the task with a savepoint, then write data to that partition, and start the task with the savepoint, the task will resume from the saved state. Due to the startingOffset in the state being -1, it will cause the task to miss the data that was written before the recovery point.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: tanjialiang

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 25/Sep/23 09:25

Updated:: 18/Dec/23 09:31

Resolved:: 25/Sep/23 10:51