Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
To use FileStreamSink in production, there are two requirements for FileStreamSink's log:
1.Versioning. A future Spark version should be able to read the metadata of an old FileStreamSink.
2. Compaction. As reading from many small files is usually pretty slow, we should compact small metadata files into big files.
See the PR description for more details.