Currently in Kafka Streams, we create stream tasks upon getting newly assigned partitions in rebalance callback function
, which involves initialization of the processor state stores as well (including opening the rocksDB, restore the store from changelog, etc, which takes time).
With a large number of state stores, the initialization time itself could take tens of seconds, which usually is larger than the consumer session timeout. As a result, when the callback is completed, the consumer is already treated as failed by the coordinator and rebalance again.
We need to consider if we can optimize the initialization process, or move it out of the callback function, and while initializing the stores one-by-one, use poll call to send heartbeats to avoid being kicked out by coordinator.