Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-9199

Improve handling of out of sequence errors lower than last acked sequence

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • producer
    • None

    Description

      The broker attempts to cache the state of the last 5 batches in order to enable duplicate detection. This caching is not guaranteed across restarts: we only write the state of the last batch to the snapshot file. It is possible in some cases for this to result in a sequence such as the following:

      1. Send sequence=n
      2. Sequence=n successfully written, but response is not received
      3. Leader changes after broker restart
      4. Send sequence=n+1
      5. Receive successful response for n+1
      6. Sequence=n times out and is retried, results in out of order sequence

      There are a couple problems here. First, it would probably be better for the broker to return DUPLICATE_SEQUENCE_NUMBER when a sequence number is received which is lower than any of the cached batches. Second, the producer handles this situation by just retrying until expiration of the delivery timeout. Instead it should just fail the batch. 

      This issue popped up in the reassignment system test. It ultimately caused the test to fail because the producer was stuck retrying the duplicate batch repeatedly until ultimately giving up.

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            hachikuji Jason Gustafson
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: