Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-9455

Consider using TreeMap for in-memory stores of Streams

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • streams

    Description

      From A. Sophie Blee-Goldman: It's worth noting that it might be a good idea to switch to TreeMap for different reasons. Right now the ConcurrentSkipListMap allows us to safely perform range queries without copying over the entire keyset, but the performance on point queries seems to scale noticeably worse with the number of unique keys. Point queries are used by aggregations while range queries are used by windowed joins, but of course both are available within the PAPI and for interactive queries so it's hard to say which we should prefer. Maybe rather than make that tradeoff we should have one version for efficient range queries (a "JoinWindowStore") and one for efficient point queries ("AggWindowStore") - or something. I know we've had similar thoughts for a different RocksDB store layout for Joins (although I can't find that ticket anywhere..), it seems like the in-memory stores could benefit from a special "Join" version as well cc/ Guozhang Wang

      Here are some random thoughts:

      1. For kafka streams processing logic (i.e. without IQ), it's better to make all processing logic relying on point queries rather than range queries. Right now the only processor that use range queries are, as mentioned above, windowed stream-stream joins. I think we should consider using a different window implementation for this (and as a result also get rid of the retainDuplicate flags) to refactor the windowed stream-stream join operation.

      2. With 1), range queries would only be exposed as IQ. Depending on its usage frequency I think it makes lots of sense to optimize for single-point queries.

      Of course, even without step 1) we should still consider using tree-map for windowed in-memory stores to have a better scaling effect.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            high.lee highluck
            guozhang Guozhang Wang

            Dates

              Created:
              Updated:

              Slack

                Issue deployment