Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.23.3, 2.0.1-alpha, 2.4.1
-
Reviewed
Description
If a job generates many files to commit then the commitJob method call at the end of the job can take minutes. This is a performance regression from 1.x, as 1.x had the tasks commit directly to the final output directory as they were completing and commitJob had very little to do. The commit work was processed in parallel and overlapped the processing of outstanding tasks. In 0.23/2.x, the commit is single-threaded and waits until all tasks have completed before commencing.
Attachments
Attachments
Issue Links
- is depended upon by
-
MAPREDUCE-6608 Work Preserving AM Restart for MapReduce
- Open
-
MAPREDUCE-6336 Enable v2 FileOutputCommitter by default
- Resolved
- is related to
-
MAPREDUCE-5485 Allow repeating job commit by extending OutputCommitter API
- Closed
- relates to
-
MAPREDUCE-6275 Race condition in FileOutputCommitter v2 for user-specified task output subdirs
- Closed