Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17604

Support purging aged file entry for FileStreamSource metadata log

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • 3.1.0
    • None
    • Structured Streaming
    • None

    Description

      Currently with SPARK-15698, FileStreamSource metadata log will be compacted periodically (10 batches by default), this means compacted batch file will contain whole file entries been processed. With the time passed, the compacted batch file will be accumulated to a relative large file.

      With SPARK-17165, now FileStreamSource doesn't track the aged file entry, but in the log we still keep the full records, this is not necessary and quite time-consuming during recovery. So here propose to also add file entry purging ability to FileStreamSource metadata log.

      This is pending on SPARK-15698.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jerryshao Saisai Shao
              Votes:
              2 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated: