Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24634

Add a new metric regarding number of rows later than watermark

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.1.0
    • 3.1.0
    • Structured Streaming
    • None

    Description

      Spark filters out late rows which are later than watermark while applying operations which leverage window. While Spark exposes information regarding watermark to StreamingQueryListener, there's no information regarding rows being filtered out due to watermark. The information should help end users to adjust watermark while operating their query.

      We could expose metric regarding number of rows later than watermark and being filtered out. It would be ideal to support side-output to consume late rows, but it doesn't look like easy so addressing this first.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            kabhwan Jungtaek Lim
            kabhwan Jungtaek Lim
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment