Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-39940

Batch query cannot read the updates from streaming query if streaming query writes to the catalog table via DSv1 sink

Rank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.1, 3.2.3, 3.4.0
    • 3.4.0
    • Structured Streaming
    • None

    Description

      (I think this should be ancient issue but there's no good way to list up all affected versions, so I just pick up the recent version in each version line.)

      When streaming query writes to catalog table via DSv1 sink, there is no refreshing/invalidation of the destination table, hence querying the destination table with batch query is not guaranteed to read the latest "committed" updates.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            kabhwan Jungtaek Lim
            kabhwan Jungtaek Lim
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment