Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32776

Limit in streaming should not be optimized away by PropagateEmptyRelation

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.1.0
    • 3.0.2, 3.1.0
    • Structured Streaming
    • None

    Description

      Right now, the limit operator in a streaming query may get optimized away when the relation is empty. This can be problematic for stateful streaming, as this empty batch will not write any state store files, and the next batch will fail when trying to read these state store files and throw a file not found error.

      We should not let PropagateEmptyRelation optimize away the Limit operator for streaming queries.

      This ticket is intended to apply a small and safe fix for PropagateEmptyRelation. A fundamental fix that can prevent this from happening again in the future and in other optimizer rules is more desirable, but that's a much larger task.

       

      Attachments

        Activity

          People

            liwensun Liwen Sun
            liwen Liwen Sun
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: