[SPARK-7953] Spark should cleanup output dir if job fails - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 1.3.0
Fix Version/s: None
Component/s: Spark Core
Labels:
- bulk-closed

Description

MR calls abortTask and abortJob on the OutputCommitter to clean up the temporary output directories, but Spark doesn't seem to be doing that (when outputting an RDD to a Hadoop FS)

For example: PairRDDFunctions.saveAsNewAPIHadoopDataset should call committer.abortTask(hadoopContext) in the finally block inside the writeShard closure. And also jobCommitter.abortJob(jobTaskContext, JobStatus.State.FAILED) should be called if the job fails.

Additionally, MR removes the output dir if job fails, but Spark doesn't.

Attachments

Issue Links

blocks

PIG-4243 Fix "TestStore" for Spark engine

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Mohit Sabharwal

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 29/May/15 20:05

Updated:: 21/May/19 04:13

Resolved:: 21/May/19 04:13