Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-29995

Structured Streaming file-sink log grow indefinitely

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.4.0
    • None
    • Structured Streaming

    Description

      When i use structured streaming parquet sink, I've noticed that the File-Sink-Log files keep getting bigger, they are in {$checkpoint/_spark_metadata/}, i don't think this is reasonable.

      And when they merge files,task batches take longer to run, just like the screenshot below

      Attachments

        1. file.png
          60 kB
          zhang liming
        2. task.png
          154 kB
          zhang liming

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            zhangliming zhang liming
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment