Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-37970

Introduce a new interface on streaming data source to notify the latest seen offset

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.0
    • 3.3.0
    • Structured Streaming
    • None

    Description

      We figure out the case of streaming data source that knowing the latest seen offset when restarting query would be handy and useful to implement some feature. One useful case is enabling the data source to track the offset by itself, for the case where the external storage of data source is not exposing any API to provide the latest available offset.

      We will propose a new interface on streaming data source, which indicates Spark to give the latest seen offset whenever the query is being restarted. For the first start of the query, the initial offset of the data source should be retrieved from calling initialOffset.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            kabhwan Jungtaek Lim
            kabhwan Jungtaek Lim
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment