In HADOOP-13786 I'm adding a custom subclass for FileOutputFormat, one which can talk direct to the S3A Filesystem for more efficient operations, better failure modes, and, most critically, as part of HADOOP-13345, atomic commit of output. The normal committer relies on directory rename() being atomic for this; for S3 we don't have that luxury.
To support a custom committer, we need to be able to tell FileOutputFormat (and implicitly, all subclasses which don't have their own custom committer), to use our new S3AOutputCommitter.
- FileOutputFormat takes a factory to create committers.
- The factory to take a URI and TaskAttemptContext and return a committer
- the default implementation always returns a FileOutputCommitter
- A configuration option allows a new factory to be named
- An S3AOutputCommitterFactory to return a FileOutputCommitter or new S3AOutputCommitter depending upon the URI of the destination.
Note that MRv1 already supports configurable committers; this is only the V2 API