Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-5836

Highlight in Spark documentation that by default Spark does not delete its temporary files

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Not A Problem
    • None
    • 1.3.1, 1.4.0
    • Documentation
    • None

    Description

      We recently learnt the hard way (in a prod system) that Spark by default does not delete its temporary files until it is stopped. WIthin a relatively short time span of heavy Spark use the disk of our prod machine filled up completely because of multiple shuffle files written to it. We think there should be better documentation around the fact that after a job is finished it leaves a lot of rubbish behind so that this does not come as a surprise.

      Probably a good place to highlight that fact would be the documentation of spark.local.dir property, which controls where Spark temporary files are written.

      Attachments

        Issue Links

          Activity

            People

              ilganeli Ilya Ganelin
              tomasz Tomasz Dudziak
              Votes:
              1 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: