Details
Description
In HADOOP-13786 I'm adding a custom subclass for FileOutputFormat, one which can talk direct to the S3A Filesystem for more efficient operations, better failure modes, and, most critically, as part of HADOOP-13345, atomic commit of output. The normal committer relies on directory rename() being atomic for this; for S3 we don't have that luxury.
To support a custom committer, we need to be able to tell FileOutputFormat (and implicitly, all subclasses which don't have their own custom committer), to use our new S3AOutputCommitter.
I propose:
- FileOutputFormat takes a factory to create committers.
- The factory to take a URI and TaskAttemptContext and return a committer
- the default implementation always returns a FileOutputCommitter
- A configuration option allows a new factory to be named
- An S3AOutputCommitterFactory to return a FileOutputCommitter or new S3AOutputCommitter depending upon the URI of the destination.
Note that MRv1 already supports configurable committers; this is only the V2 API
Attachments
Attachments
Issue Links
- depends upon
-
MAPREDUCE-6956 FileOutputCommitter to gain abstract superclass PathOutputCommitter
- Resolved
- is depended upon by
-
HADOOP-14584 WASB to support high-performance commit protocol
- Resolved
-
HADOOP-13786 Add S3A committers for zero-rename commits to S3 endpoints
- Resolved
- is duplicated by
-
HADOOP-13786 Add S3A committers for zero-rename commits to S3 endpoints
- Resolved
- is part of
-
HADOOP-13786 Add S3A committers for zero-rename commits to S3 endpoints
- Resolved
- is related to
-
MAPREDUCE-6961 Pull up FileOutputCommitter.getOutputPath to PathOutputCommitter
- Resolved