Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22017

watermark evaluation with multi-input stream operators is unspecified

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.2.0
    • Fix Version/s: 2.3.0
    • Component/s: Structured Streaming
    • Labels:
      None

      Description

      Watermarks are stored as a single value in StreamExecution. If a query has multiple watermark nodes (which can generally only happen with multi input operators like union), a headOption call will arbitrarily pick one to use as the real one. This will happen independently in each batch, possibly leading to strange and undefined behavior.

      We should instead choose the minimum from all watermark exec nodes as the query-wide watermark.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              joseph.torres Jose Torres
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: