[SPARK-29995] Structured Streaming file-sink log grow indefinitely - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 2.4.0
Fix Version/s: None
Component/s: Structured Streaming
Labels:
- bulk-closed

Description

When i use structured streaming parquet sink, I've noticed that the File-Sink-Log files keep getting bigger, they are in {$checkpoint/_spark_metadata/}, i don't think this is reasonable.

And when they merge files,task batches take longer to run, just like the screenshot below

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

task.png
22/Nov/19 04:19
154 kB
zhang liming
file.png
22/Nov/19 04:19
60 kB
zhang liming

Issue Links

is related to

SPARK-30462 Structured Streaming _spark_metadata fills up Spark Driver memory when having lots of objects

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: zhang liming

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 22/Nov/19 04:16

Updated:: 25/May/21 01:54

Resolved:: 25/May/21 01:40