Currently when expanding the KS cluster, the new node's partitions will be unavailable during the rebalance, which for large states can take a very long time, or for small state stores even more than a few ms can be a deal breaker for micro service use cases.
One workaround would be two execute the rebalance in two phases:
1) start running state store building on the new node
2) once the state store is fully populated on the new node, only then rebalance the tasks - there will still be a rebalance pause, but would be greatly reduced
KAFKA-6144 - Allow state stores to serve stale reads during rebalance