Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24295

Purge Structured streaming FileStreamSinkLog metadata compact file data.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • 2.3.0
    • None
    • Structured Streaming
    • None

    Description

      FileStreamSinkLog metadata logs are concatenated to a single compact file after defined compact interval.

      For long running jobs, compact file size can grow up to 10's of GB's, Causing slowness  while reading the data from FileStreamSinkLog dir as spark is defaulting to the "_spark_metadata" dir for the read.

      We need a functionality to purge the compact file size.

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              iqbal_khattra Iqbal Singh
              Votes:
              5 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated: