Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
2.4.0
-
None
Description
When i use structured streaming parquet sink, I've noticed that the File-Sink-Log files keep getting bigger, they are in {$checkpoint/_spark_metadata/}, i don't think this is reasonable.
And when they merge files,task batches take longer to run, just like the screenshot below
Attachments
Attachments
Issue Links
- is related to
-
SPARK-30462 Structured Streaming _spark_metadata fills up Spark Driver memory when having lots of objects
- Resolved