Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-17445

Kafka streams keeps rebalancing with the following reasons

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.8.0
    • None
    • streams
    • None

    Description

      We recently upgraded Kafka streams version to 3.8.0 and are seeing that the streams app keeps rebalancing and does not process any events

      We have explicitly set the config 
      GROUP_INSTANCE_ID_CONFIG

      This is what we see on the broker logs:

      [GroupCoordinator 2]: Preparing to rebalance group {consumer-group-name} in state PreparingRebalance with old generation 24781 (__consumer_offsets-29) (reason: Updating metadata for static member {} with instance id {}; client reason: rebalance failed due to UnjoinedGroupException)

      We also tried to remove the GROUP_INSTANCE_ID_CONFIG but then see these logs and rebalancing and no processing still

      sessionTimeoutMs=45000, rebalanceTimeoutMs=1800000, supportedProtocols=List(stream)) has left group {groupId} through explicit `LeaveGroup`; client reason: the consumer unsubscribed from all topics (kafka.coordinator.group.GroupCoordinator)

      other logs show:

      during Stable; client reason: need to revoke partitions and re-join)

      client reason: triggered followup rebalance scheduled for 0

      On the application logs we see:

      1. state being restored from changelog topic

      2. INFO org.apache.kafka.streams.processor.internals.StreamThread - stream-thread  at state RUNNING: partitions  lost due to missed rebalance.

      Detected that the thread is being fenced. This implies that this thread missed a rebalance and dropped out of the consumer group. Will close out all assigned tasks and rejoin the consumer group.

       

      3. Task Migrated exceptions

      org.apache.kafka.streams.errors.TaskMigratedException: Error encountered sending record to topic
      org.apache.kafka.common.errors.InvalidProducerEpochException: Producer with transactionalId

      attempted to produce with an old epoch

      Written offsets would not be recorded and no more records would be sent since the producer is fenced, indicating the task may be migrated out; it means all tasks belonging to this thread should be migrated.

      at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.recordSendError(RecordCollectorImpl.java:306) ~[kafka-streams-3.8.0.jar:?]

      at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.lambda$send$1(RecordCollectorImpl.java:286) ~[kafka-streams-3.8.0.jar:?]

      at datadog.trace.instrumentation.kafka_clients.KafkaProducerCallback.onCompletion(KafkaProducerCallback.java:44) ~[?:?]

      at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:1106) ~[kafka-clients-3.8.0.jar:?]

      Attachments

        Activity

          People

            Unassigned Unassigned
            rohitbobade Rohit Bobade
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: