Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6956

FileOutputCommitter to gain abstract superclass PathOutputCommitter

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 3.0.0-beta1
    • Fix Version/s: 3.0.0-beta1
    • Component/s: mrv2
    • Labels:
      None

      Description

      This is the initial step of MAPREDUCE-6823, which proposes a factory behind FileOutputFormat to create different committers for different filesystems, if so configured..

      This patch simply adds the new abstract superclass of FileOutputCommitter, PathOutputCommitter extends OutputCommitter. This abstract class adds the getWorkPath() method as an abstract method, with FIleOutputCommitter being the implementation..

      FileOutputFormat then relaxes its requirement of any committer returned by getOutputCommitter(), so that instead of requiring a FileOutputCommitter or subclass, it only needs a PathOutputCommitter, using PathOutputCommitter.getWorkPath() to get the work path.

      What does that do?

      It allows people to implement subclasses of FileOutputFormat which can provide their own committers which don't need to inherit the complexity that FileOutputCommitter has acquired over time

      Currently anyone implementing a new committer (example: Netflix S3 committer) needs to subclass FileOutputCommitter, which is too complex to understand except under a debugger with co-recursive routines, lots of methods which need to be overwritten to guarantee a safe subclass, and, because of its critical role and known subclassing, something which isn't ever going to be cleaned up.

      A new, lean, parent class which FileOutputFormat can handle allows people to write new committers which don't have to worry about implementation details of FileOutputCommitter, but instead how well they implement the semantics of committing work.

      The full MAPREDUCE-6823 goes beyond this with a change to FileOutputFormat for a factory for creating FS-specific PathOutputCommitter instances. This patch doesn't include that, as that is something which needs to be reviewed in the context of HADOOP-13786 and ideally 1+ committer for another store, so people can say "this factory model works".

      All I'm proposing here is: tune the committer class hierarchy in MRv2 so that people can more easily implement committers, and when that factory is done, for it to be switched to easily. And I'd like this in branch-3 from the outset, so existing code which calls FileOutputFormat.getCommitter() to get a FileOutputCommitter just to call getWorkPath() can move to the new interface across all of Hadoop 3.

      1. MAPREDUCE-6956-001.patch
        24 kB
        Steve Loughran
      2. MAPREDUCE-6956-002.patch
        24 kB
        Steve Loughran

        Issue Links

          Activity

          Hide
          stevel@apache.org Steve Loughran added a comment -

          Patch 001

          This is the code from MAPREDUC-6823 excluding the factory itself and adding a unit test of assignment and resolution.

          As well as the new interface,

          • it adds toString() values of FileOutputCommitter, JobContextImpl, TaskAttemptContextImpl. Invaluable during debugging sessions, logs and assertions.
          • FileOutputCommitter adds an explicit check for task attempts having a null task attempt path.
          • makes changes to FileOutputFormat so as to stop checkstyle complaining (tabs, some line endings, javadocs).

          I've been using this code in HADOOP-13768 for a while, this is just the core visible change which simplifes the work for anyone trying to implement committers. FileOutputFormat and downstream (e.g. org.apache.spark.internal.io.HadoopMapReduceCommitProtocol are casting to FileOutputCommitter when they need a workpath, and they benefit from the ability to use something a bit more abstract

          Show
          stevel@apache.org Steve Loughran added a comment - Patch 001 This is the code from MAPREDUC-6823 excluding the factory itself and adding a unit test of assignment and resolution. As well as the new interface, it adds toString() values of FileOutputCommitter, JobContextImpl, TaskAttemptContextImpl. Invaluable during debugging sessions, logs and assertions. FileOutputCommitter adds an explicit check for task attempts having a null task attempt path. makes changes to FileOutputFormat so as to stop checkstyle complaining (tabs, some line endings, javadocs). I've been using this code in HADOOP-13768 for a while, this is just the core visible change which simplifes the work for anyone trying to implement committers. FileOutputFormat and downstream (e.g. org.apache.spark.internal.io.HadoopMapReduceCommitProtocol are casting to FileOutputCommitter when they need a workpath, and they benefit from the ability to use something a bit more abstract
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 39s Docker mode activated.
                Prechecks
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
                trunk Compile Tests
          +1 mvninstall 14m 34s trunk passed
          +1 compile 0m 26s trunk passed
          +1 checkstyle 0m 21s trunk passed
          +1 mvnsite 0m 29s trunk passed
          +1 findbugs 0m 51s trunk passed
          +1 javadoc 0m 22s trunk passed
                Patch Compile Tests
          +1 mvninstall 0m 24s the patch passed
          +1 compile 0m 24s the patch passed
          -1 javac 0m 24s hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core generated 3 new + 162 unchanged - 0 fixed = 165 total (was 162)
          -1 checkstyle 0m 17s hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core: The patch generated 8 new + 94 unchanged - 12 fixed = 102 total (was 106)
          +1 mvnsite 0m 26s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 0m 54s the patch passed
          +1 javadoc 0m 19s the patch passed
                Other Tests
          +1 unit 2m 39s hadoop-mapreduce-client-core in the patch passed.
          +1 asflicense 0m 15s The patch does not generate ASF License warnings.
          24m 2s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:71bbb86
          JIRA Issue MAPREDUCE-6956
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12886146/MAPREDUCE-6956-001.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux e94ff8f0656f 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / c35510a
          Default Java 1.8.0_144
          findbugs v3.1.0-RC1
          javac https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7125/artifact/patchprocess/diff-compile-javac-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core.txt
          checkstyle https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7125/artifact/patchprocess/diff-checkstyle-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core.txt
          Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7125/testReport/
          modules C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core
          Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7125/console
          Powered by Apache Yetus 0.5.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 39s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.       trunk Compile Tests +1 mvninstall 14m 34s trunk passed +1 compile 0m 26s trunk passed +1 checkstyle 0m 21s trunk passed +1 mvnsite 0m 29s trunk passed +1 findbugs 0m 51s trunk passed +1 javadoc 0m 22s trunk passed       Patch Compile Tests +1 mvninstall 0m 24s the patch passed +1 compile 0m 24s the patch passed -1 javac 0m 24s hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core generated 3 new + 162 unchanged - 0 fixed = 165 total (was 162) -1 checkstyle 0m 17s hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core: The patch generated 8 new + 94 unchanged - 12 fixed = 102 total (was 106) +1 mvnsite 0m 26s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 54s the patch passed +1 javadoc 0m 19s the patch passed       Other Tests +1 unit 2m 39s hadoop-mapreduce-client-core in the patch passed. +1 asflicense 0m 15s The patch does not generate ASF License warnings. 24m 2s Subsystem Report/Notes Docker Image:yetus/hadoop:71bbb86 JIRA Issue MAPREDUCE-6956 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12886146/MAPREDUCE-6956-001.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux e94ff8f0656f 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / c35510a Default Java 1.8.0_144 findbugs v3.1.0-RC1 javac https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7125/artifact/patchprocess/diff-compile-javac-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core.txt checkstyle https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7125/artifact/patchprocess/diff-checkstyle-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core.txt Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7125/testReport/ modules C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7125/console Powered by Apache Yetus 0.5.0 http://yetus.apache.org This message was automatically generated.
          Hide
          jlowe Jason Lowe added a comment -

          I like the idea! It would be nice to cleanup the checkstyle and javac warnings but otherwise +1 lgtm.

          Show
          jlowe Jason Lowe added a comment - I like the idea! It would be nice to cleanup the checkstyle and javac warnings but otherwise +1 lgtm.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          great! thx for the review. I'll do the cleanup today. All the javac complaints are deprecation of subclassed methods, so I can't fix them. checkstyle: yes

          Show
          stevel@apache.org Steve Loughran added a comment - great! thx for the review. I'll do the cleanup today. All the javac complaints are deprecation of subclassed methods, so I can't fix them. checkstyle: yes
          Hide
          stevel@apache.org Steve Loughran added a comment -

          thanks, this is the patch I'm committing: checkstyle is now happy with the patch as far as I can tell

          Show
          stevel@apache.org Steve Loughran added a comment - thanks, this is the patch I'm committing: checkstyle is now happy with the patch as far as I can tell
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 14s Docker mode activated.
                Prechecks
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
                trunk Compile Tests
          +1 mvninstall 12m 41s trunk passed
          +1 compile 0m 22s trunk passed
          +1 checkstyle 0m 16s trunk passed
          +1 mvnsite 0m 26s trunk passed
          +1 findbugs 0m 43s trunk passed
          +1 javadoc 0m 18s trunk passed
                Patch Compile Tests
          +1 mvninstall 0m 21s the patch passed
          +1 compile 0m 21s the patch passed
          -1 javac 0m 21s hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core generated 3 new + 162 unchanged - 0 fixed = 165 total (was 162)
          +1 checkstyle 0m 15s hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core: The patch generated 0 new + 93 unchanged - 12 fixed = 93 total (was 105)
          +1 mvnsite 0m 24s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 0m 51s the patch passed
          +1 javadoc 0m 16s the patch passed
                Other Tests
          +1 unit 2m 33s hadoop-mapreduce-client-core in the patch passed.
          +1 asflicense 0m 11s The patch does not generate ASF License warnings.
          20m 47s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:71bbb86
          JIRA Issue MAPREDUCE-6956
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12887361/MAPREDUCE-6956-002.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux f1b77e584af6 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 78bdf10
          Default Java 1.8.0_144
          findbugs v3.1.0-RC1
          javac https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7139/artifact/patchprocess/diff-compile-javac-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core.txt
          Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7139/testReport/
          modules C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core
          Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7139/console
          Powered by Apache Yetus 0.5.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 14s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.       trunk Compile Tests +1 mvninstall 12m 41s trunk passed +1 compile 0m 22s trunk passed +1 checkstyle 0m 16s trunk passed +1 mvnsite 0m 26s trunk passed +1 findbugs 0m 43s trunk passed +1 javadoc 0m 18s trunk passed       Patch Compile Tests +1 mvninstall 0m 21s the patch passed +1 compile 0m 21s the patch passed -1 javac 0m 21s hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core generated 3 new + 162 unchanged - 0 fixed = 165 total (was 162) +1 checkstyle 0m 15s hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core: The patch generated 0 new + 93 unchanged - 12 fixed = 93 total (was 105) +1 mvnsite 0m 24s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 51s the patch passed +1 javadoc 0m 16s the patch passed       Other Tests +1 unit 2m 33s hadoop-mapreduce-client-core in the patch passed. +1 asflicense 0m 11s The patch does not generate ASF License warnings. 20m 47s Subsystem Report/Notes Docker Image:yetus/hadoop:71bbb86 JIRA Issue MAPREDUCE-6956 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12887361/MAPREDUCE-6956-002.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux f1b77e584af6 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 78bdf10 Default Java 1.8.0_144 findbugs v3.1.0-RC1 javac https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7139/artifact/patchprocess/diff-compile-javac-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core.txt Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7139/testReport/ modules C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7139/console Powered by Apache Yetus 0.5.0 http://yetus.apache.org This message was automatically generated.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          committed to 3.0-beta; now I can work on the next detail: what's a good API for letting people define different committers for different filesystems

          Show
          stevel@apache.org Steve Loughran added a comment - committed to 3.0-beta; now I can work on the next detail: what's a good API for letting people define different committers for different filesystems
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12883 (See https://builds.apache.org/job/Hadoop-trunk-Commit/12883/)
          MAPREDUCE-6956 FileOutputCommitter to gain abstract superclass (stevel: rev 11390c2d111910b01d9c4d3e39dee49babae272f)

          • (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/TaskAttemptContextImpl.java
          • (add) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/PathOutputCommitter.java
          • (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.java
          • (add) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/output/TestPathOutputCommitter.java
          • (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/FileOutputFormat.java
          • (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/JobContextImpl.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12883 (See https://builds.apache.org/job/Hadoop-trunk-Commit/12883/ ) MAPREDUCE-6956 FileOutputCommitter to gain abstract superclass (stevel: rev 11390c2d111910b01d9c4d3e39dee49babae272f) (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/TaskAttemptContextImpl.java (add) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/PathOutputCommitter.java (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.java (add) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/output/TestPathOutputCommitter.java (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/FileOutputFormat.java (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/JobContextImpl.java

            People

            • Assignee:
              stevel@apache.org Steve Loughran
              Reporter:
              stevel@apache.org Steve Loughran
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development