Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32776

Limit in streaming should not be optimized away by PropagateEmptyRelation

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.1.0
    • Fix Version/s: 3.0.2, 3.1.0
    • Component/s: Structured Streaming
    • Labels:
      None

      Description

      Right now, the limit operator in a streaming query may get optimized away when the relation is empty. This can be problematic for stateful streaming, as this empty batch will not write any state store files, and the next batch will fail when trying to read these state store files and throw a file not found error.

      We should not let PropagateEmptyRelation optimize away the Limit operator for streaming queries.

      This ticket is intended to apply a small and safe fix for PropagateEmptyRelation. A fundamental fix that can prevent this from happening again in the future and in other optimizer rules is more desirable, but that's a much larger task.

       

        Attachments

          Activity

            People

            • Assignee:
              liwensun Liwen Sun
              Reporter:
              liwen Liwen Sun

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment