Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-7953

Spark should cleanup output dir if job fails

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 1.3.0
    • None
    • Spark Core

    Description

      MR calls abortTask and abortJob on the OutputCommitter to clean up the temporary output directories, but Spark doesn't seem to be doing that (when outputting an RDD to a Hadoop FS)

      For example: PairRDDFunctions.saveAsNewAPIHadoopDataset should call committer.abortTask(hadoopContext) in the finally block inside the writeShard closure. And also jobCommitter.abortJob(jobTaskContext, JobStatus.State.FAILED) should be called if the job fails.

      Additionally, MR removes the output dir if job fails, but Spark doesn't.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              mohitsabharwal Mohit Sabharwal
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: