Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
1.3.0
-
None
Description
MR calls abortTask and abortJob on the OutputCommitter to clean up the temporary output directories, but Spark doesn't seem to be doing that (when outputting an RDD to a Hadoop FS)
For example: PairRDDFunctions.saveAsNewAPIHadoopDataset should call committer.abortTask(hadoopContext) in the finally block inside the writeShard closure. And also jobCommitter.abortJob(jobTaskContext, JobStatus.State.FAILED) should be called if the job fails.
Additionally, MR removes the output dir if job fails, but Spark doesn't.
Attachments
Issue Links
- blocks
-
PIG-4243 Fix "TestStore" for Spark engine
- Closed