Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-569

Make message offsets ordered set within a system stream partition

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: container
    • Labels:
      None

      Description

      It would be nice to make message offsets as an ordered set within a system stream partition. I.e. if message offsets from the same partition is monotonically increasing according to the order that messages are delivered.

      It would provide the following two features:

      • de-dup w/o the need to keep all message offsets
      • determinism when re-calculating the output from a buffered set of messages

      As for now, w/o the ordering between the message offsets, it would require the following implementation in window operator to make sure de-dup and determinism:

      • keep all message offsets ever seen in persist storage if want to dedup with arbitrary length of replay of messages; Or keep all message offsets within a window if dedup just within a window length
      • keep the insertion order of messages in buffer, which potentially also requires persist KV store support that also keeps insertion order in the store

      Both seem complicated and are not needed if we have ordering between message offsets.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                nickpan47 Yi Pan (Data Infrastructure)
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated: