Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27424

Joining of one stream against the most recent update in another stream

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.1.0
    • None
    • Structured Streaming
    • None

    Description

      Currently, adding the most recent update of a row with a given key to another stream is not possible. This situation arises if one wants to use the current state, of one object, for example when joining the room temperature with the current weather.

      This ticket covers creation of a stream_lead and modification of the streaming join logic (and state store) to additionally allow joins of the form

      SELECT *
      FROM A, B
      WHERE 
          A.key = B.key 
          AND A.time >= B.time 
          AND A.time < stream_lead(B.time)
      

      The major aspect of this change is that we actually need a third watermark to cover how late updates may come.

      A rough sketch may be found in the attached document.

      Attachments

        1. join-last-update-design.pdf
          775 kB
          Thilo Schneider

        Activity

          People

            Unassigned Unassigned
            thiloschneider Thilo Schneider
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: