Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-12951

Infinite loop while restoring a GlobalKTable

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.7.0
    • 2.7.2, 2.8.1, 3.0.0
    • streams
    • None

    Description

      We encountered an issue a few time in some of our Kafka Streams application.
      After an unexpected restart of our applications, some instances have not been able to resume operating.

      They got stuck while trying to restore the state store of a GlobalKTable. The only way to resume operating was to manually delete their `state.dir`.

      We observed the following timeline:

      • The POD restart, and we encounter the same issue until we manually delete the state.dir

       

      Regarding the topic, by leveraging the DumpLogSegment tool, I can see:

      • Offset 381 - Last business message received
      • Offset 382 - Txn COMMIT (last message)

      I think the real culprit is that the checkpoint is 383 instead of being 382. For information, the global topic is a transactional topic.

      While experimenting with the API, it seems that the consumer.position() call is a bit tricky, after a seek() and a poll(), it seems that the position() is actually returning the seek position. After the poll() call, even if no data is returned, the position() is returning the LSO. I did an example on https://gist.github.com/Dabz/9aa0b4d1804397af6e7b6ad8cba82dcb .

      Attachments

        Issue Links

          Activity

            People

              mjsax Matthias J. Sax
              Dabz Damien Gasparina
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: