[SPARK-14678] Add a file sink log to support versioning and compaction - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.0.0
Component/s: Structured Streaming
Labels:
None

Target Version/s:

2.0.0

Description

To use FileStreamSink in production, there are two requirements for FileStreamSink's log:

1.Versioning. A future Spark version should be able to read the metadata of an old FileStreamSink.
2. Compaction. As reading from many small files is usually pretty slow, we should compact small metadata files into big files.

See the PR description for more details.

Attachments

Issue Links

links to

[Github] Pull Request #12435 (zsxwing)

Activity

People

Assignee:: Shixiong Zhu

Reporter:: Shixiong Zhu

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 15/Apr/16 23:26

Updated:: 01/Nov/16 22:15

Resolved:: 20/Apr/16 20:33