Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-7293

Merge followed by groupByKey/join might violate co-partioning

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • streams
    • None

    Description

      The merge() operations can be applied to input KStreams that have a different number of tasks (ie, input topic partitions). For this case, the input topics are not co-partitioned and thus the result KStream is not partitioned even if each input KStream is partitioned by its own.

      Because, no "repartitionRequired" flag is set on the input KStreams, the flag is also not set on the output KStream. Hence, if a groupByKey() or join() operation is applied the output KStream, we don't insert a repartition topic. However, repartitioning would be required because the KStream is not partitioned.

      We cannot detect this during compile time, because the number or partitions is unknown, and thus, we cannot decide if repartitioning is required or not. However, we can add a runtime check similar to joins() that checks if data is correctly (co-)partitioned and if not, we can raise a runtime exception.

      Note, for merge() in contrast to join(), we should only check for co-partitioning, if the merge() is followed by a groupByKey() or join() operations.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            mjsax Matthias J. Sax

            Dates

              Created:
              Updated:

              Slack

                Issue deployment