[MAPREDUCE-6956] FileOutputCommitter to gain abstract superclass PathOutputCommitter - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 3.0.0-beta1
Fix Version/s: 3.0.0-beta1
Component/s: mrv2
Labels:
None

Target Version/s:

3.0.0-beta1

Description

This is the initial step of ~~MAPREDUCE-6823~~, which proposes a factory behind FileOutputFormat to create different committers for different filesystems, if so configured..

This patch simply adds the new abstract superclass of FileOutputCommitter, PathOutputCommitter extends OutputCommitter. This abstract class adds the getWorkPath() method as an abstract method, with FIleOutputCommitter being the implementation..

FileOutputFormat then relaxes its requirement of any committer returned by getOutputCommitter(), so that instead of requiring a FileOutputCommitter or subclass, it only needs a PathOutputCommitter, using PathOutputCommitter.getWorkPath() to get the work path.

What does that do?

It allows people to implement subclasses of FileOutputFormat which can provide their own committers which don't need to inherit the complexity that FileOutputCommitter has acquired over time

Currently anyone implementing a new committer (example: Netflix S3 committer) needs to subclass FileOutputCommitter, which is too complex to understand except under a debugger with co-recursive routines, lots of methods which need to be overwritten to guarantee a safe subclass, and, because of its critical role and known subclassing, something which isn't ever going to be cleaned up.

A new, lean, parent class which FileOutputFormat can handle allows people to write new committers which don't have to worry about implementation details of FileOutputCommitter, but instead how well they implement the semantics of committing work.

The full ~~MAPREDUCE-6823~~ goes beyond this with a change to FileOutputFormat for a factory for creating FS-specific PathOutputCommitter instances. This patch doesn't include that, as that is something which needs to be reviewed in the context of ~~HADOOP-13786~~ and ideally 1+ committer for another store, so people can say "this factory model works".

All I'm proposing here is: tune the committer class hierarchy in MRv2 so that people can more easily implement committers, and when that factory is done, for it to be switched to easily. And I'd like this in branch-3 from the outset, so existing code which calls FileOutputFormat.getCommitter() to get a FileOutputCommitter just to call getWorkPath() can move to the new interface across all of Hadoop 3.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

MAPREDUCE-6956-002.patch
15/Sep/17 15:16
24 kB
Steve Loughran
MAPREDUCE-6956-001.patch
08/Sep/17 19:04
24 kB
Steve Loughran

Issue Links

is depended upon by

MAPREDUCE-6823 FileOutputFormat to support configurable PathOutputCommitter factory

Resolved

HADOOP-13786 Add S3A committers for zero-rename commits to S3 endpoints

Resolved

is related to

MAPREDUCE-6961 Pull up FileOutputCommitter.getOutputPath to PathOutputCommitter

Resolved

relates to

MAPREDUCE-7060 Cherry Pick PathOutputCommitter class/factory to branch-3.0

Resolved

Activity

People

Assignee:: Steve Loughran

Reporter:: Steve Loughran

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 08/Sep/17 18:05

Updated:: 26/Feb/18 15:03

Resolved:: 15/Sep/17 16:01