Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-38078

Aggregation with Watermark in AppendMode is holding data beyong water mark boundary.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.2.0
    • None
    • Structured Streaming
    • None

    Description

       I am struggling with a unique issue. I am not sure if my understanding is wrong or this is a bug with spark.
       

      1.  I am reading a stream from events hub/kafka ( Extract)
      2.  Pivoting and Aggregating the above dataframe ( Transformation). This is a WATERMARKED aggregation.
      3.  writing the aggregation to Console/Delta table in APPEND  mode with a Trigger . 

      However, the most recently published message to event hub is not writing to console/delta even after falling out of the watermark time. 
       
       My understanding is the event should be inserted to  the Delta table after Eventtime+Watermark.
       

      Moreover, all the events in the memory stored must be flushed out to the sink irrespective of the watermark before stopping to mark a graceful shutdown .

       

      Please advise.

      Attachments

        Activity

          People

            Unassigned Unassigned
            KKdataengineer krishna
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: