[MAPREDUCE-6823] FileOutputFormat to support configurable PathOutputCommitter factory - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 3.0.0-alpha2
Fix Version/s: None
Component/s: mrv2
Labels:
None
Environment:

Targeting S3 as the output of work

Target Version/s:

3.1.0

Description

In ~~HADOOP-13786~~ I'm adding a custom subclass for FileOutputFormat, one which can talk direct to the S3A Filesystem for more efficient operations, better failure modes, and, most critically, as part of ~~HADOOP-13345~~, atomic commit of output. The normal committer relies on directory rename() being atomic for this; for S3 we don't have that luxury.

To support a custom committer, we need to be able to tell FileOutputFormat (and implicitly, all subclasses which don't have their own custom committer), to use our new S3AOutputCommitter.

I propose:

FileOutputFormat takes a factory to create committers.
The factory to take a URI and TaskAttemptContext and return a committer
the default implementation always returns a FileOutputCommitter
A configuration option allows a new factory to be named
An S3AOutputCommitterFactory to return a FileOutputCommitter or new S3AOutputCommitter depending upon the URI of the destination.

Note that MRv1 already supports configurable committers; this is only the V2 API

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

MAPREDUCE-6823-004.patch
06/Nov/17 17:33
46 kB
Steve Loughran
MAPREDUCE-6823-002.patch
13/Oct/17 11:48
43 kB
Steve Loughran
MAPREDUCE-6823-002.patch
16/Oct/17 16:20
43 kB
Steve Loughran
HADOOP-13786-HADOOP-13345-001.patch
16/Dec/16 19:41
126 kB
Steve Loughran

Issue Links

depends upon

MAPREDUCE-6956 FileOutputCommitter to gain abstract superclass PathOutputCommitter

Resolved

is depended upon by

HADOOP-14584 WASB to support high-performance commit protocol

Resolved

HADOOP-13786 Add S3A committers for zero-rename commits to S3 endpoints

Resolved

is duplicated by

HADOOP-13786 Add S3A committers for zero-rename commits to S3 endpoints

Resolved

is part of

HADOOP-13786 Add S3A committers for zero-rename commits to S3 endpoints

Resolved

is related to

MAPREDUCE-6961 Pull up FileOutputCommitter.getOutputPath to PathOutputCommitter

Resolved

(1 is related to)

Activity

People

Assignee:: Steve Loughran

Reporter:: Steve Loughran

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 14/Dec/16 16:25

Updated:: 22/Mar/18 06:07

Resolved:: 22/Mar/18 06:07