Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22017

watermark evaluation with multi-input stream operators is unspecified

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2.0
    • 2.3.0
    • Structured Streaming
    • None

    Description

      Watermarks are stored as a single value in StreamExecution. If a query has multiple watermark nodes (which can generally only happen with multi input operators like union), a headOption call will arbitrarily pick one to use as the real one. This will happen independently in each batch, possibly leading to strange and undefined behavior.

      We should instead choose the minimum from all watermark exec nodes as the query-wide watermark.

      Attachments

        Activity

          People

            Unassigned Unassigned
            joseph.torres Jose Torres
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: