Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22230

agg(last('attr)) gives weird results for streaming

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.2.0
    • 2.3.0
    • Structured Streaming
    • None

    Description

      In stream aggregation, last('attr) yields the last value from the first microbatch forever. I'm not sure if it's fair to call this a correctness bug, since last doesn't have strong correctness semantics, but ignoring all rows past the first microbatch is at least weird.

      Simple repro in StreamingAggregationSuite:

      val input = MemoryStream[Int]

      val aggregated = input.toDF().agg(last('value))
      testStream(aggregated, OutputMode.Complete())(
      AddData(input, 1, 2, 3),
      CheckLastBatch(3),
      AddData(input, 4, 5, 6),
      CheckLastBatch(6) // actually yields 3 again

      Attachments

        Activity

          People

            joseph.torres Jose Torres
            joseph.torres Jose Torres
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: