Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6823

FileOutputFormat to support configurable PathOutputCommitter factory



    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 3.0.0-alpha2
    • None
    • mrv2
    • None
    • Targeting S3 as the output of work


      In HADOOP-13786 I'm adding a custom subclass for FileOutputFormat, one which can talk direct to the S3A Filesystem for more efficient operations, better failure modes, and, most critically, as part of HADOOP-13345, atomic commit of output. The normal committer relies on directory rename() being atomic for this; for S3 we don't have that luxury.

      To support a custom committer, we need to be able to tell FileOutputFormat (and implicitly, all subclasses which don't have their own custom committer), to use our new S3AOutputCommitter.

      I propose:

      1. FileOutputFormat takes a factory to create committers.
      2. The factory to take a URI and TaskAttemptContext and return a committer
      3. the default implementation always returns a FileOutputCommitter
      4. A configuration option allows a new factory to be named
      5. An S3AOutputCommitterFactory to return a FileOutputCommitter or new S3AOutputCommitter depending upon the URI of the destination.

      Note that MRv1 already supports configurable committers; this is only the V2 API


        1. HADOOP-13786-HADOOP-13345-001.patch
          126 kB
          Steve Loughran
        2. MAPREDUCE-6823-002.patch
          43 kB
          Steve Loughran
        3. MAPREDUCE-6823-002.patch
          43 kB
          Steve Loughran
        4. MAPREDUCE-6823-004.patch
          46 kB
          Steve Loughran

        Issue Links



              stevel@apache.org Steve Loughran
              stevel@apache.org Steve Loughran
              0 Vote for this issue
              6 Start watching this issue