Details
-
Bug
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
2.3.0
-
None
-
None
Description
FileStreamSinkLog metadata logs are concatenated to a single compact file after defined compact interval.
For long running jobs, compact file size can grow up to 10's of GB's, Causing slowness while reading the data from FileStreamSinkLog dir as spark is defaulting to the "_spark_metadata" dir for the read.
We need a functionality to purge the compact file size.
Attachments
Attachments
Issue Links
- is related to
-
SPARK-30462 Structured Streaming _spark_metadata fills up Spark Driver memory when having lots of objects
- Resolved
- relates to
-
SPARK-27188 FileStreamSink: provide a new option to have retention on output files
- Resolved
- links to