Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-17380

Kafka Streams few partition stuck in processing - fixed after restart

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.6.2
    • None
    • streams
    • None

    Description

      Using Kafka Streams 2.6.2 and running stateful aggregations with Exactly once semantics.

      The processing logic is: 

      consume input records -> intermediate aggregate and buffer data in state store backed by change log topic -> punctuate every 15seconds - flush state store and send aggregated records downstream -> final aggregate operation and send to output topic

      Since we use spot instances, one of the pod got restarted and rebalance was triggered and state was getting restored from changelog topic.

      we noticed ProducerFenced exceptions:

      org.apache.kafka.common.errors.ProducerFencedException: Producer attempted an

      operation with an old epoch. Either there is a newer producer with the same transactionalId, or the producer's transaction has been expired by the broker.

      After this a few partitions were stuck and no records were processed util we restarted the application.

      We had configured:
       
      transaction.timeout.ms to 30 seconds

      session.timeout.ms to 30 seconds

      could you please advise if there's any known fix for this edge case? 

      Attachments

        Activity

          People

            Unassigned Unassigned
            rohitbobade Rohit Bobade
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: