Details
-
New Feature
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
3.1.0
-
None
-
None
Description
Currently with SPARK-15698, FileStreamSource metadata log will be compacted periodically (10 batches by default), this means compacted batch file will contain whole file entries been processed. With the time passed, the compacted batch file will be accumulated to a relative large file.
With SPARK-17165, now FileStreamSource doesn't track the aged file entry, but in the log we still keep the full records, this is not necessary and quite time-consuming during recovery. So here propose to also add file entry purging ability to FileStreamSource metadata log.
This is pending on SPARK-15698.
Attachments
Issue Links
- depends upon
-
SPARK-15698 Ability to remove old metadata for structure streaming MetadataLog
- Resolved
- links to