Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-33153

Kafka using latest-offset maybe missing data

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Duplicate
    • None
    • None
    • Connectors / Kafka
    • None

    Description

      When Kafka start with the latest-offset strategy, it does not fetch the latest snapshot offset and specify it for consumption. Instead, it sets the startingOffset to -1 (KafkaPartitionSplit.LATEST_OFFSET, which makes currentOffset = -1, and call the KafkaConsumer's  seekToEnd API). The currentOffset is only set to the consumed offset + 1 when the task consumes data, and this currentOffset is stored in the state during checkpointing. If there are very few messages in Kafka and a partition has not consumed any data, and I stop the task with a savepoint, then write data to that partition, and start the task with the savepoint, the task will resume from the saved state. Due to the startingOffset in the state being -1, it will cause the task to miss the data that was written before the recovery point.

      Attachments

        Activity

          People

            Unassigned Unassigned
            tanjialiang tanjialiang
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: