Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-9647

Add ability to suppress until window end (not close)

    XMLWordPrintableJSON

Details

    • Wish
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • streams
    • None

    Description

      Preface: This feature request originates from a recently asked question on Stack Overflow, for which Matthias J. Sax suggested to create a feature request.

      Feature Request: In addition to suppressing updates to a windowed KTable until a window closes, we suggest to only suppress "early" results. By early results we mean results computed before the window ends, but not those results occurring during the grace period. Thus, this suppress option would suppress all aggregation results with timestamp < window end, but forward all records with timestamp >= window end and timestamp < window close.

      Use Case: For an exemplary use case, we refer to John Roesler's blog post on the initial introduction of the suppress operator. The post argues that for the case of altering not every intermediate aggregation result should trigger an alert message, but only the "final" result. Otherwise, a "follow-up email telling people to ignore the first message" might become required if the final results would not cause an alert but intermediate results would. Kafka Streams' current solution for this use case would be to use a suppress operation, which would only forward the final result, which would be the last result before no further updates could occur. This is when the grace period of a window passed (the window closes).

      However, ideally we would like to set the grace period as large as possible to allow for very late-arriving messages, which in turn would lead to very late alerts. On the other hand, such late-arriving messages are rare in practice and normally the order of events corresponds largely to the order of messages. Thus, a reasonable option would be to suppress aggregation results only until the window ends (i.e. stream time > window end) and then forward this "most likely final" result. For the use case of altering, this means an alert is triggered when we are relatively certain that recorded data requires an alert. Then, only the "seldom" case of late updates which would change our decision would require the "follow-up email telling people to ignore the first message". Such rare "correction" should be acceptable for many use cases.

      Further extension: In addition to suppressing all updates until the window ends and afterwards forwarding all updates, a further extension would be to only forward late records every x seconds. Maybe the existing `Suppressed.untilTimeLimit( .. )` could be reused for this.

      Attachments

        Activity

          People

            Unassigned Unassigned
            SoerenHenning Sören Henning
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: