Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-14315

Kraft: 1 broker setup, broker took 34 seconds to transition from PrepareCommit to CompleteCommit

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • kraft
    • None

    Description

      I'm still looking into a PR failure in my client and noticed something a bit strange. I know that technically I should be using RequireStableFetchOffsets in my transaction tests to prevent rebalances while a transaction is not finalized. I'll be adding that.

      However, these tests have never failed against zookeeper mode. The client goes through a lot of efforts to avoid needing KIP-447 behavior, and the assumption with localhost testing is that things run fast enough (and that there are enough guards) that problems would not be encountered.

      That looks to not be true with a kraft broker, but looking at __transaction_state, the following looks to be especially problematic:

       

      __transaction_state partition 33 offset 7 at [2022-10-18 11:15:37.821]
      TxnMetadataKey(0) 9f87dc04dc3f4d5b15ef3072c531cf46327278307df8e149fa966462cd40c10b
      TxnMetadataValue(0)
            ProducerID           41
            ProducerEpoch        0
            TimeoutMillis        120000
            State                PrepareCommit
            Topics               __consumer_offsets=>[13] e7c7d971626fbaf4bfb33975e57089167939e6acabb4c4fc534eb148462e45cc=>[4 5 12 16]  
            LastUpdateTimestamp  1666113337821
            StartTimestamp       1666113335311
      __transaction_state partition 33 offset 8 at [2022-10-18 11:16:11.419]
      TxnMetadataKey(0) 9f87dc04dc3f4d5b15ef3072c531cf46327278307df8e149fa966462cd40c10b
      TxnMetadataValue(0)
            ProducerID           41
            ProducerEpoch        0
            TimeoutMillis        120000
            State                CompleteCommit
            Topics     
            LastUpdateTimestamp  1666113337821
            StartTimestamp       1666113335311

       

      I've captured that using my kcl tool.

      Note that the transaction enters PrepareCommit at 11:15:37.821, and then enters CompleteCommit at 11:16:11.419. AFAICT, this means that in my single node kraft setup, the broker took 34 seconds to transition commit states internally.

      I noticed this in tests because a rebalance happened between those 34 seconds, which caused duplicate consumption because transactional offset commits were not finalized and the old commits were picked up.

      This ticket is related to KAFKA-14312, in that this failure is cropping up as I've worked around KAFKA-14312 within the client itself.

      Attachments

        Activity

          People

            Unassigned Unassigned
            twmb Travis Bischel
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: