Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-29995

Structured Streaming file-sink log grow indefinitely

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.4.0
    • None
    • Structured Streaming

    Description

      When i use structured streaming parquet sink, I've noticed that the File-Sink-Log files keep getting bigger, they are in {$checkpoint/_spark_metadata/}, i don't think this is reasonable.

      And when they merge files,task batches take longer to run, just like the screenshot below

      Attachments

        1. task.png
          154 kB
          zhang liming
        2. file.png
          60 kB
          zhang liming

        Issue Links

          Activity

            People

              Unassigned Unassigned
              zhangliming zhang liming
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: