Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-42549

Allow multiple event time columns in single DataFrame

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.5.0
    • None
    • Structured Streaming
    • None

    Description

      This is a follow-up task for SPARK-42376.

      I've observed that all stateful operators deals with event time column via finding the "first" event time column and picking it up. This was safe before supporting multiple stateful operators, because the only way a single DataFrame contains more than one event time column is stream-stream join, and then we didn't support adding another stateful operator. It's safer to assume there is only one event time column.

      With the support of multiple stateful operators, the assumption no longer holds true. Users can add another stateful operators after stream-stream join, which breaks the assumption. Even for the case of (stream-stream join)-stream join, any side of stream-stream join can have multiple event time columns.

      Since it's non-trivial to reason about the right behavior for all stateful operators, we just disallow DataFrame to have more than one event time columns in SPARK-42376. (The check will be performed in stateful operator against input DataFrame.)

      This ticket tracks the effort to reason about the right behavior for all stateful operators, and support more than one event time columns in the input DataFrame for stateful operators.

      Attachments

        Activity

          People

            Unassigned Unassigned
            kabhwan Jungtaek Lim
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: