Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26655

Support multiple aggregates in Structured Streaming append mode

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • 3.1.0
    • None
    • Structured Streaming
    • None

    Description

      Right now multiple aggregates are not supported in structured streaming.

      However, in append mode, the aggregates are emitted only after the watermark passes the threshold (e.g. the window boundary) and the emitted value is not affected by further late data. So it possible to chain multiple aggregates in 'Append' output mode without worrying about retractions.

      However the current event time watermarks in structured streaming are tracked at a global level and this does not work when aggregates are chained.

      We need to track the watermarks at individual operator level so that each operator can make progress independently and not rely on global min or max value.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              arunmahadevan Arun Mahadevan
              Votes:
              4 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated: