Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17119

Add configuration property to allow the history server to delete .inprogress files

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Duplicate
    • Affects Version/s: 2.0.0
    • Fix Version/s: None
    • Component/s: Spark Core
    • Labels:

      Description

      The History Server (HS) currently only considers completed applications when deleting event logs from spark.history.fs.logDirectory (since SPARK-6879). This means that over time, .inprogress files (from failed jobs, jobs where the SparkContext is not closed, spark-shell exits etc...) can accumulate and impact the HS.

      Instead of having to manually delete these files, maybe users could have the option of telling the HS to delete all files where (now - attempt.lastUpdated) > spark.history.fs.cleaner.maxAge, or just delete .inprogress files with lastUpdated older then 7d?

      https://github.com/apache/spark/blob/d6dc12ef0146ae409834c78737c116050961f350/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L467

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                bjornjons Bjorn Jonsson
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: