Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24295

Purge Structured streaming FileStreamSinkLog metadata compact file data.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: In Progress
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.3.0
    • Fix Version/s: None
    • Component/s: Structured Streaming
    • Labels:
      None

      Description

      FileStreamSinkLog metadata logs are concatenated to a single compact file after defined compact interval.

      For long running jobs, compact file size can grow up to 10's of GB's, Causing slowness  while reading the data from FileStreamSinkLog dir as spark is defaulting to the "_spark_metadata" dir for the read.

      We need a functionality to purge the compact file size.

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                iqbal_khattra Iqbal Singh
              • Votes:
                5 Vote for this issue
                Watchers:
                16 Start watching this issue

                Dates

                • Created:
                  Updated: