Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-29995

Structured Streaming file-sink log grow indefinitely

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Incomplete
    • Affects Version/s: 2.4.0
    • Fix Version/s: None
    • Component/s: Structured Streaming
    • Labels:

      Description

      When i use structured streaming parquet sink, I've noticed that the File-Sink-Log files keep getting bigger, they are in {$checkpoint/_spark_metadata/}, i don't think this is reasonable.

      And when they merge files,task batches take longer to run, just like the screenshot below

        Attachments

        1. file.png
          60 kB
          zhang liming
        2. task.png
          154 kB
          zhang liming

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                zhangliming zhang liming
              • Votes:
                0 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: