Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.0.0
-
None
-
None
Description
Currently in Kafka Streams, if any DSL operation occurs that may modify the keys of the record stream, the stream is flagged for repartitioning. Currently this flag is checked prior to a stream join or an aggregation and if set the stream is piped through a transient repartition topic. This ensures messages with the same key are always co-located in the same partition and hence same stream task and state store.
The same mechanism should be used to trigger repartitioning prior to stream transform, transformValues and process calls that specify one or more state stores.
Currently without the forced repartitioning, for streams where the key has been modified, there is no guarantee the same keys will be processed by the same task which would be what you expect when using a state store. Given that aggregations and joins already automatically make this guarantee it seems inconsistent that transform and process do not provide the same guarantees.
To achieve the same guarantees currently, developers must manually pipe the stream through a topic to force the repartitioning. This works, but is sub-optimal since you don't get the handy optimisation where the repartition topic contents is purged after use.
Attachments
Issue Links
- Is contained by
-
KAFKA-8611 Add KStream#repartition operation
- Resolved
- is duplicated by
-
KAFKA-6641 Consider auto repartitioning for Stream.transform() API
- Resolved