Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-8360 Structured Streaming (aka Streaming DataFrames)
  3. SPARK-14678

Add a file sink log to support versioning and compaction

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.0.0
    • Structured Streaming
    • None

    Description

      To use FileStreamSink in production, there are two requirements for FileStreamSink's log:

      1.Versioning. A future Spark version should be able to read the metadata of an old FileStreamSink.
      2. Compaction. As reading from many small files is usually pretty slow, we should compact small metadata files into big files.

      See the PR description for more details.

      Attachments

        Activity

          People

            zsxwing Shixiong Zhu
            zsxwing Shixiong Zhu
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: