[KAFKA-7293] Merge followed by groupByKey/join might violate co-partioning - ASF JIRA

Attach files

Attach Screenshot

Add vote

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: streams
Labels:
None

Description

The merge() operations can be applied to input KStreams that have a different number of tasks (ie, input topic partitions). For this case, the input topics are not co-partitioned and thus the result KStream is not partitioned even if each input KStream is partitioned by its own.

Because, no "repartitionRequired" flag is set on the input KStreams, the flag is also not set on the output KStream. Hence, if a groupByKey() or join() operation is applied the output KStream, we don't insert a repartition topic. However, repartitioning would be required because the KStream is not partitioned.

We cannot detect this during compile time, because the number or partitions is unknown, and thus, we cannot decide if repartitioning is required or not. However, we can add a runtime check similar to joins() that checks if data is correctly (co-)partitioned and if not, we can raise a runtime exception.

Note, for merge() in contrast to join(), we should only check for co-partitioning, if the merge() is followed by a groupByKey() or join() operations.