Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-954

The new interface's Context objects should be interfaces

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: client
    • Labels:
      None
    • Release Note:
      Changed Map-Reduce context objects to be interfaces.

      Description

      When I was doing HADOOP-1230, I was persuaded to make the Context objects as classes. I think that was a serious mistake. It caused a lot of information leakage into the public classes.

      1. MAPREDUCE-954.patch
        76 kB
        Arun C Murthy
      2. MAPREDUCE-954.patch
        103 kB
        Arun C Murthy
      3. MAPREDUCE-954.patch
        122 kB
        Arun C Murthy
      4. MAPREDUCE-954.patch
        139 kB
        Arun C Murthy
      5. MAPREDUCE-954.patch
        157 kB
        Arun C Murthy
      6. MAPREDUCE-954.patch
        159 kB
        Arun C Murthy
      7. MAPREDUCE-954.patch
        159 kB
        Arun C Murthy

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-22-branch #83 (See https://builds.apache.org/job/Hadoop-Mapreduce-22-branch/83/)
          MAPREDUCE-3138. Add a utility to help applications bridge changes in Context Objects APIs due to MAPREDUCE-954. Contributed by omalley.

          shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1180750
          Files :

          • /hadoop/common/branches/branch-0.22/mapreduce/CHANGES.txt
          • /hadoop/common/branches/branch-0.22/mapreduce/src/java/org/apache/hadoop/mapreduce/ContextFactory.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-22-branch #83 (See https://builds.apache.org/job/Hadoop-Mapreduce-22-branch/83/ ) MAPREDUCE-3138 . Add a utility to help applications bridge changes in Context Objects APIs due to MAPREDUCE-954 . Contributed by omalley. shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1180750 Files : /hadoop/common/branches/branch-0.22/mapreduce/CHANGES.txt /hadoop/common/branches/branch-0.22/mapreduce/src/java/org/apache/hadoop/mapreduce/ContextFactory.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #850 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/850/)
          MAPREDUCE-3138. Add a utility to help applications bridge changes in Context Objects APIs due to MAPREDUCE-954. Contributed by omalley.

          acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1178650
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/ContextFactory.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #850 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/850/ ) MAPREDUCE-3138 . Add a utility to help applications bridge changes in Context Objects APIs due to MAPREDUCE-954 . Contributed by omalley. acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1178650 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/ContextFactory.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #820 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/820/)
          MAPREDUCE-3138. Add a utility to help applications bridge changes in Context Objects APIs due to MAPREDUCE-954. Contributed by omalley.

          acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1178650
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/ContextFactory.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #820 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/820/ ) MAPREDUCE-3138 . Add a utility to help applications bridge changes in Context Objects APIs due to MAPREDUCE-954 . Contributed by omalley. acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1178650 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/ContextFactory.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #1026 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1026/)
          MAPREDUCE-3138. Add a utility to help applications bridge changes in Context Objects APIs due to MAPREDUCE-954. Contributed by omalley.

          acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1178650
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/ContextFactory.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #1026 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1026/ ) MAPREDUCE-3138 . Add a utility to help applications bridge changes in Context Objects APIs due to MAPREDUCE-954 . Contributed by omalley. acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1178650 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/ContextFactory.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #1084 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1084/)
          MAPREDUCE-3138. Add a utility to help applications bridge changes in Context Objects APIs due to MAPREDUCE-954. Contributed by omalley.

          acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1178650
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/ContextFactory.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #1084 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1084/ ) MAPREDUCE-3138 . Add a utility to help applications bridge changes in Context Objects APIs due to MAPREDUCE-954 . Contributed by omalley. acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1178650 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/ContextFactory.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #1006 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1006/)
          MAPREDUCE-3138. Add a utility to help applications bridge changes in Context Objects APIs due to MAPREDUCE-954. Contributed by omalley.

          acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1178650
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/ContextFactory.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #1006 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1006/ ) MAPREDUCE-3138 . Add a utility to help applications bridge changes in Context Objects APIs due to MAPREDUCE-954 . Contributed by omalley. acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1178650 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/ContextFactory.java
          Hide
          Tom White added a comment -

          Can we mark the interfaces as "Public Evolving"? This would overcome the problems with API evolvability.

          I've opened an issue to address this, which I would like to go into 0.21. See MAPREDUCE-1012.

          Thanks for addressing my other comments, Arun.

          Show
          Tom White added a comment - Can we mark the interfaces as "Public Evolving"? This would overcome the problems with API evolvability. I've opened an issue to address this, which I would like to go into 0.21. See MAPREDUCE-1012 . Thanks for addressing my other comments, Arun.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #58 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/58/)
          . Change Map-Reduce context objects to be interfaces. (acmurthy)

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #58 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/58/ ) . Change Map-Reduce context objects to be interfaces. (acmurthy)
          Hide
          Arun C Murthy added a comment -

          I just committed this.

          Show
          Arun C Murthy added a comment - I just committed this.
          Hide
          Arun C Murthy added a comment -

          Fixed javadoc warnings:

               [exec] +1 overall.  
               [exec] 
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec] 
               [exec]     +1 tests included.  The patch appears to include 42 new or modified tests.
               [exec] 
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec] 
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec] 
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec] 
               [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
               [exec] 
          
          Show
          Arun C Murthy added a comment - Fixed javadoc warnings: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 42 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec]
          Hide
          Arun C Murthy added a comment -

          I'm not sure that WrappedMapper and WrappedReducer belong in a "lib" package, since the classes in "lib" are user-facing, and these are framework classes. (People might see them and wonder how they can use a WrappedMapper in their application, for example.) They would be better in the task package I think.

          Owen & I went back forth on this - finally Owen convinced me it's better to put it lib.

          Can we move org.apache.hadoop.mapreduce.task to org.apache.hadoop.mapreduce.server.task to better emphasise that this is non-user code. This reflects the packaging of HDFS more, where things that run on the cluster are under a "server" package. We should have another JIRA to move org.apache.hadoop.mapreduce.task.reduce to org.apache.hadoop.mapreduce.server.task.reduce.

          We already have the new shuffle in org.apache.hadoop.mapreduce.task.reduce, thus org.apache.hadoop.mapreduce.task seemed logical.

          It's a shame JobContextImpl is public and in org.apache.hadoop.mapreduce since users shouldn't be exposed to it. Can we move it to another package?

          Done.

          Since Job extends JobContextImpl you don't need the changes that change the _ATTR constants (e.g. OUTPUT_FORMAT_CLASS_ATTR) to JobContextImpl._ATTR - they can be referred to directly.

          Done.

          • What's the compatibility story for previous releases? Would a 0.20 MR program written to the new ("mapreduce" package) API work with the new interfaces unchanged? What about a 0.20 program using the old MR API - will it continue to work with the old MR API with these changes?

          Yes to both.

          Show
          Arun C Murthy added a comment - I'm not sure that WrappedMapper and WrappedReducer belong in a "lib" package, since the classes in "lib" are user-facing, and these are framework classes. (People might see them and wonder how they can use a WrappedMapper in their application, for example.) They would be better in the task package I think. Owen & I went back forth on this - finally Owen convinced me it's better to put it lib. Can we move org.apache.hadoop.mapreduce.task to org.apache.hadoop.mapreduce.server.task to better emphasise that this is non-user code. This reflects the packaging of HDFS more, where things that run on the cluster are under a "server" package. We should have another JIRA to move org.apache.hadoop.mapreduce.task.reduce to org.apache.hadoop.mapreduce.server.task.reduce. We already have the new shuffle in org.apache.hadoop.mapreduce.task.reduce, thus org.apache.hadoop.mapreduce.task seemed logical. It's a shame JobContextImpl is public and in org.apache.hadoop.mapreduce since users shouldn't be exposed to it. Can we move it to another package? Done. Since Job extends JobContextImpl you don't need the changes that change the _ATTR constants (e.g. OUTPUT_FORMAT_CLASS_ATTR) to JobContextImpl. _ATTR - they can be referred to directly. Done. What's the compatibility story for previous releases? Would a 0.20 MR program written to the new ("mapreduce" package) API work with the new interfaces unchanged? What about a 0.20 program using the old MR API - will it continue to work with the old MR API with these changes? Yes to both.
          Hide
          Arun C Murthy added a comment -

          Uh, its javadoc warnings - fixing it.

          Show
          Arun C Murthy added a comment - Uh, its javadoc warnings - fixing it.
          Hide
          Owen O'Malley added a comment -

          +1 except for the JavaDoc warning.

          We should also only include org.apache.hadoop.mapreduce.task's JavaDoc in the javadoc-dev target, but we can do that in a followup.

          Show
          Owen O'Malley added a comment - +1 except for the JavaDoc warning. We should also only include org.apache.hadoop.mapreduce.task's JavaDoc in the javadoc-dev target, but we can do that in a followup.
          Hide
          Arun C Murthy added a comment -

          Final patch, passes all tests and output of test-patch is:

               [exec] -1 overall.
               [exec]
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec]
               [exec]     +1 tests included.  The patch appears to include 42 new or modified tests.
               [exec]
               [exec]     -1 javadoc.  The javadoc tool appears to have generated 1 warning messages.
               [exec]
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec]
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec]
               [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
               [exec]
          

          The javac warning is due to the fact that this patch introduces some new deprecated classes in org.apache.hadoop.mapred.

          Show
          Arun C Murthy added a comment - Final patch, passes all tests and output of test-patch is: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 42 new or modified tests. [exec] [exec] -1 javadoc. The javadoc tool appears to have generated 1 warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] The javac warning is due to the fact that this patch introduces some new deprecated classes in org.apache.hadoop.mapred.
          Hide
          Tom White added a comment -

          Thanks for the new patch, Arun. Can we mark the interfaces as "Public Evolving"? This would overcome the problems with API evolvability.

          Some other feedback:

          • I'm not sure that WrappedMapper and WrappedReducer belong in a "lib" package, since the classes in "lib" are user-facing, and these are framework classes. (People might see them and wonder how they can use a WrappedMapper in their application, for example.) They would be better in the task package I think.
          • Can we move org.apache.hadoop.mapreduce.task to org.apache.hadoop.mapreduce.server.task to better emphasise that this is non-user code. This reflects the packaging of HDFS more, where things that run on the cluster are under a "server" package. We should have another JIRA to move org.apache.hadoop.mapreduce.task.reduce to org.apache.hadoop.mapreduce.server.task.reduce.
          • It's a shame JobContextImpl is public and in org.apache.hadoop.mapreduce since users shouldn't be exposed to it. Can we move it to another package?
          • Since Job extends JobContextImpl you don't need the changes that change the *_ATTR constants (e.g. OUTPUT_FORMAT_CLASS_ATTR) to JobContextImpl.*_ATTR - they can be referred to directly.
          • What's the compatibility story for previous releases? Would a 0.20 MR program written to the new ("mapreduce" package) API work with the new interfaces unchanged? What about a 0.20 program using the old MR API - will it continue to work with the old MR API with these changes?
          Show
          Tom White added a comment - Thanks for the new patch, Arun. Can we mark the interfaces as "Public Evolving"? This would overcome the problems with API evolvability. Some other feedback: I'm not sure that WrappedMapper and WrappedReducer belong in a "lib" package, since the classes in "lib" are user-facing, and these are framework classes. (People might see them and wonder how they can use a WrappedMapper in their application, for example.) They would be better in the task package I think. Can we move org.apache.hadoop.mapreduce.task to org.apache.hadoop.mapreduce.server.task to better emphasise that this is non-user code. This reflects the packaging of HDFS more, where things that run on the cluster are under a "server" package. We should have another JIRA to move org.apache.hadoop.mapreduce.task.reduce to org.apache.hadoop.mapreduce.server.task.reduce. It's a shame JobContextImpl is public and in org.apache.hadoop.mapreduce since users shouldn't be exposed to it. Can we move it to another package? Since Job extends JobContextImpl you don't need the changes that change the *_ATTR constants (e.g. OUTPUT_FORMAT_CLASS_ATTR ) to JobContextImpl.*_ATTR - they can be referred to directly. What's the compatibility story for previous releases? Would a 0.20 MR program written to the new ("mapreduce" package) API work with the new interfaces unchanged? What about a 0.20 program using the old MR API - will it continue to work with the old MR API with these changes?
          Hide
          Arun C Murthy added a comment -

          Updated patch for review while I test. After a discussion with Owen I went ahead and fixed-up mrunit to use the interfaces directly, thus illustrating how the new interfaces can be used completely.

          Show
          Arun C Murthy added a comment - Updated patch for review while I test. After a discussion with Owen I went ahead and fixed-up mrunit to use the interfaces directly, thus illustrating how the new interfaces can be used completely.
          Hide
          Owen O'Malley added a comment -

          This is looking better. I'd suggest a couple more changes:

          1. Move the getCounter methods from TaskInputOutputContext to JobContext.
          2. Move progress and setStatus from TaskInputOutputContext to TaskAttemptContext.
          3. Make JobContextImpl public.
          4. Make the other context impl classes package local.

          Show
          Owen O'Malley added a comment - This is looking better. I'd suggest a couple more changes: 1. Move the getCounter methods from TaskInputOutputContext to JobContext. 2. Move progress and setStatus from TaskInputOutputContext to TaskAttemptContext. 3. Make JobContextImpl public. 4. Make the other context impl classes package local.
          Hide
          Aaron Kimball added a comment -

          Thanks for the new patch; this does make MRUnit look much more sane.

          Show
          Aaron Kimball added a comment - Thanks for the new patch; this does make MRUnit look much more sane.
          Hide
          Arun C Murthy added a comment -

          Here is an updated patch which shows the plan to use the new org.apache.hadoop.mapreduce.lib.

          {map|reduce}

          .Wrapped

          {Mapper|Reducer} to completely get away from the ugly hacks for mrunit, Chain{Mapper|Reducer}

          etc.

          Show
          Arun C Murthy added a comment - Here is an updated patch which shows the plan to use the new org.apache.hadoop.mapreduce.lib. {map|reduce} .Wrapped {Mapper|Reducer} to completely get away from the ugly hacks for mrunit, Chain{Mapper|Reducer} etc.
          Hide
          Aaron Kimball added a comment -

          I don't see how this new state of affairs improves MRUnit. MRUnit still needs to implement the "hack" of having a dummy Mapper class (MockMapContextWrapper) that wraps around the Context that it creates, when from an architectural standpoint, MRUnit would like to just feed a wholely-divorced top-level MockMapContext into the client's Mapper implementation.

          Switching Mapper to include an abstract class Context instead of a concrete class Context which is used by its map() method does not allow MRUnit to do the "right" thing; MRUnit still takes advantage of the fact that Mapper.Context is non-static purely for purposes of saving typing in the map() method's type signature (by absorbing the generic qualifiers) and assumes that it can fully replace the implementation with its own.

          My understanding is that the purpose of moving from interface Mapper to class Mapper was to allow forward-evolvable APIs with default implementations. Switching back from abstract classes to interfaces again seems like asking for trouble a few months down the line when we need to add some default behavior.

          Show
          Aaron Kimball added a comment - I don't see how this new state of affairs improves MRUnit. MRUnit still needs to implement the "hack" of having a dummy Mapper class ( MockMapContextWrapper ) that wraps around the Context that it creates, when from an architectural standpoint, MRUnit would like to just feed a wholely-divorced top-level MockMapContext into the client's Mapper implementation. Switching Mapper to include an abstract class Context instead of a concrete class Context which is used by its map() method does not allow MRUnit to do the "right" thing; MRUnit still takes advantage of the fact that Mapper.Context is non-static purely for purposes of saving typing in the map() method's type signature (by absorbing the generic qualifiers) and assumes that it can fully replace the implementation with its own. My understanding is that the purpose of moving from interface Mapper to class Mapper was to allow forward-evolvable APIs with default implementations. Switching back from abstract classes to interfaces again seems like asking for trouble a few months down the line when we need to add some default behavior.
          Hide
          Tom White added a comment -

          Owen, thanks for pointing out MRUnit. MRUnit has a wrapper class, but I don't see how it would be significantly better if Context were an interface.

          The common theme was that the details, especially in the constructors and fields were far more specific than the interfaces.

          The constructors are a difference between classes and interfaces, but I believe we can mark them as Private Evolving to get around this. I didn't see any fields that had to be added to context classes that wouldn't be there if they were interfaces. I suppose I'm finding it hard to see what concrete things we are gaining in exchange for sacrificing API evolution between major releases.

          Show
          Tom White added a comment - Owen, thanks for pointing out MRUnit. MRUnit has a wrapper class, but I don't see how it would be significantly better if Context were an interface. The common theme was that the details, especially in the constructors and fields were far more specific than the interfaces. The constructors are a difference between classes and interfaces, but I believe we can mark them as Private Evolving to get around this. I didn't see any fields that had to be added to context classes that wouldn't be there if they were interfaces. I suppose I'm finding it hard to see what concrete things we are gaining in exchange for sacrificing API evolution between major releases.
          Hide
          Owen O'Malley added a comment -

          I should also add that MRUnit is another use case. With the new cleaner interfaces, it should require a lot fewer hacks to implement. By having a lot less implementation detail, we have a lot lower exposure to changes in the implementation of the context objects.

          Show
          Owen O'Malley added a comment - I should also add that MRUnit is another use case. With the new cleaner interfaces, it should require a lot fewer hacks to implement. By having a lot less implementation detail, we have a lot lower exposure to changes in the implementation of the context objects.
          Hide
          Owen O'Malley added a comment -

          Thanks Tom, for that. I hadn't seen that syntax for creating nested objects before.

          The use cases you found are indicative of the problem, but it is really wider than that. The common theme was that the details, especially in the constructors and fields were far more specific than the interfaces. My goal with this jira was to protect ourselves from future needs where a copy constructor is not sufficient. Does that make sense?

          Show
          Owen O'Malley added a comment - Thanks Tom, for that. I hadn't seen that syntax for creating nested objects before. The use cases you found are indicative of the problem, but it is really wider than that. The common theme was that the details, especially in the constructors and fields were far more specific than the interfaces. My goal with this jira was to protect ourselves from future needs where a copy constructor is not sufficient. Does that make sense?
          Hide
          Tom White added a comment -

          Only marginally relevant to this discussion, but I noticed that the (existing) Mapper.Context object in MapTask is constructed via reflection. I think it's possible to create one directly like this:

          Mapper mapper = ...
          Mapper.Context mapperContext = mapper.new Context(...);
          

          Same for Reducer.Context.

          Show
          Tom White added a comment - Only marginally relevant to this discussion, but I noticed that the (existing) Mapper.Context object in MapTask is constructed via reflection. I think it's possible to create one directly like this: Mapper mapper = ... Mapper.Context mapperContext = mapper. new Context(...); Same for Reducer.Context.
          Hide
          Tom White added a comment -

          I've gone back to the cases that have motivated this in order to better understand why this change is needed.

          The first motivating case is MAPREDUCE-901, which needs to change the constructor to ReduceContext due to an internal type change. As noted above, user applications should never create context objects, so the constructor could be marked as Private Evolving to solve this problem, couldn't it?

          The other motivating case is MAPREDUCE-372 (chain MapReduce), where various fields of the context need to be replaced with custom versions. The approach described in this comment which is to add a new constructor to ReduceContext (a "pseudo copy constructor"), could again be made Private Evolving to mark its scope. This approach didn't cause any information leakage as far as I can tell (the previous version had to add extra getters, which did cause leakage).

          Another approach would be to create package private constructors and have a Private Evolving ContextFactory in the same package if the intent is to keep the constructors further away from user code (in IDE autocomplete, for example). The point is that I think we can avoid the problem of leakage with some changes to visibility annotations.

          Show
          Tom White added a comment - I've gone back to the cases that have motivated this in order to better understand why this change is needed. The first motivating case is MAPREDUCE-901 , which needs to change the constructor to ReduceContext due to an internal type change. As noted above, user applications should never create context objects, so the constructor could be marked as Private Evolving to solve this problem, couldn't it? The other motivating case is MAPREDUCE-372 (chain MapReduce), where various fields of the context need to be replaced with custom versions. The approach described in this comment which is to add a new constructor to ReduceContext (a "pseudo copy constructor"), could again be made Private Evolving to mark its scope. This approach didn't cause any information leakage as far as I can tell (the previous version had to add extra getters, which did cause leakage). Another approach would be to create package private constructors and have a Private Evolving ContextFactory in the same package if the intent is to keep the constructors further away from user code (in IDE autocomplete, for example). The point is that I think we can avoid the problem of leakage with some changes to visibility annotations.
          Hide
          Arun C Murthy added a comment -

          Closer, not quite.

          Show
          Arun C Murthy added a comment - Closer, not quite.
          Hide
          Arun C Murthy added a comment -

          I don't see how MapperImpl is actually used. A user-supplied Mapper extends Mapper, not MapperImpl, so I'm missing how the concrete context is actually created.

          That's the nearly complete part... apologies, I should have clarified this. smile

          I need to wire it in with a

          {Mapper|Reducer}

          Impl.createContext and use it from within the framework.

          The getProgressible() method of JobContext is misspelled

          Unfortunately I had to carry over the typo from current code - I just converted the existing org.apache.hadoop.mapred.JobContext to an interface as-is for compatibility.

          Show
          Arun C Murthy added a comment - I don't see how MapperImpl is actually used. A user-supplied Mapper extends Mapper, not MapperImpl, so I'm missing how the concrete context is actually created. That's the nearly complete part... apologies, I should have clarified this. smile I need to wire it in with a {Mapper|Reducer} Impl.createContext and use it from within the framework. The getProgressible() method of JobContext is misspelled Unfortunately I had to carry over the typo from current code - I just converted the existing org.apache.hadoop.mapred.JobContext to an interface as-is for compatibility.
          Hide
          Tom White added a comment -

          Thanks for posting some code, Arun - makes it easier to follow. A couple of comments:

          • I don't see how MapperImpl is actually used. A user-supplied Mapper extends Mapper, not MapperImpl, so I'm missing how the concrete context is actually created.
          • The getProgressible() method of JobContext is misspelled - it should be getProgressable(). (Ironically this is the kind of change that is so difficult to do with interfaces!)
          Show
          Tom White added a comment - Thanks for posting some code, Arun - makes it easier to follow. A couple of comments: I don't see how MapperImpl is actually used. A user-supplied Mapper extends Mapper, not MapperImpl, so I'm missing how the concrete context is actually created. The getProgressible() method of JobContext is misspelled - it should be getProgressable(). (Ironically this is the kind of change that is so difficult to do with interfaces!)
          Hide
          Arun C Murthy added a comment -

          Here is a nearly complete patch for illustrative purposes.

          I've had to resort to composition in

          {Mapper|Reducer}Impl.Context to include a member {Map|Reduce}ContextImpl to get around the fact that {Mapper|Reducer}

          .Context is an abstract class, thus resulting in the inability to get

          {Mapper|Reducer}Impl.Context to extend both {Mapper|Reducer}

          .Context and

          {Map|Reduce}

          ContextImpl (MI).

          Show
          Arun C Murthy added a comment - Here is a nearly complete patch for illustrative purposes. I've had to resort to composition in {Mapper|Reducer}Impl.Context to include a member {Map|Reduce}ContextImpl to get around the fact that {Mapper|Reducer} .Context is an abstract class, thus resulting in the inability to get {Mapper|Reducer}Impl.Context to extend both {Mapper|Reducer} .Context and {Map|Reduce} ContextImpl (MI).
          Hide
          Arun C Murthy added a comment -

          After a bit more thought I have to insist that we make

          {Job|TaskAttempt|TaskInputOutput|Map|Reduce}Context interfaces.

          Let me explain.

          Today each in that list extends the other, however making them abstract classes and pairing them with concrete {Job|TaskAttempt|TaskInputOutput|Map|Reduce}

          ContextImpl classes results in the necessity for MI for problem arises since we need MI

          {TaskAttempt|TaskInputOutput|Map|Reduce}

          ContextImpl (for e.g. TaskAttemptContextImpl should need to extend the JobContextImpl and be a TaskAttemptContext, thus MI). Hence I propose we make the top-level classes interfaces.

          However,

          {Mapper|Reduce}

          .Context will be abstract classes for the same reasons that Owen put down.


          Thoughts?

          Show
          Arun C Murthy added a comment - After a bit more thought I have to insist that we make {Job|TaskAttempt|TaskInputOutput|Map|Reduce}Context interfaces. Let me explain. Today each in that list extends the other, however making them abstract classes and pairing them with concrete {Job|TaskAttempt|TaskInputOutput|Map|Reduce} ContextImpl classes results in the necessity for MI for problem arises since we need MI {TaskAttempt|TaskInputOutput|Map|Reduce} ContextImpl (for e.g. TaskAttemptContextImpl should need to extend the JobContextImpl and be a TaskAttemptContext, thus MI). Hence I propose we make the top-level classes interfaces. However, {Mapper|Reduce} .Context will be abstract classes for the same reasons that Owen put down. Thoughts?
          Hide
          Doug Cutting added a comment -

          +1 This sounds like a good plan to me.

          Show
          Doug Cutting added a comment - +1 This sounds like a good plan to me.
          Hide
          Owen O'Malley added a comment -

          Only application frameworks like ChainMapper will implement Contexts. Applications should only use Contexts.

          The MapperImpl is private to map/reduce and will only consist of a method to create a ContextImpl. No other method on it will ever be called. ContextImpl will have the method bodies to implement the context and will be given the user in the places where a Mapper.Context is required.

          Since the Context objects are public, we will need to allow bodies on methods to support backwards compatibility of frameworks as we add methods to them, so they will start as pure abstract classes and slowly gain default bodies for added methods.

          Show
          Owen O'Malley added a comment - Only application frameworks like ChainMapper will implement Contexts. Applications should only use Contexts. The MapperImpl is private to map/reduce and will only consist of a method to create a ContextImpl. No other method on it will ever be called. ContextImpl will have the method bodies to implement the context and will be given the user in the places where a Mapper.Context is required. Since the Context objects are public, we will need to allow bodies on methods to support backwards compatibility of frameworks as we add methods to them, so they will start as pure abstract classes and slowly gain default bodies for added methods.
          Hide
          Doug Cutting added a comment -

          > we should take this opportunity to atleast make the Context Objects pure abstract classes if not go the full hog and make them interfaces.

          I'm okay with pure abstract classes but have concerns about the evolvability of interfaces.

          I'm trying to understand Owen's proposal. Here's my guess: Applications won't implement Mapper.Context. Rather the framework will implement it, and applications will access it referencing the abstract API. But for the framework to implement it, it must define it within a Mapper, since it cannot be a static, standalone class and still be generic. The framework's Mapper implementation won't actually be used other than to create a Mapper.ContextImpl--other mapper methods will throw UnimplementedMethodException. Do I have this right?

          Show
          Doug Cutting added a comment - > we should take this opportunity to atleast make the Context Objects pure abstract classes if not go the full hog and make them interfaces. I'm okay with pure abstract classes but have concerns about the evolvability of interfaces. I'm trying to understand Owen's proposal. Here's my guess: Applications won't implement Mapper.Context. Rather the framework will implement it, and applications will access it referencing the abstract API. But for the framework to implement it, it must define it within a Mapper, since it cannot be a static, standalone class and still be generic. The framework's Mapper implementation won't actually be used other than to create a Mapper.ContextImpl--other mapper methods will throw UnimplementedMethodException. Do I have this right?
          Hide
          Arun C Murthy added a comment -

          After discussing things over with Owen I think we should take this opportunity to atleast make the Context Objects pure abstract classes if not go the full hog and make them interfaces. This will help keep implementation details out of the way... thoughts?

          Show
          Arun C Murthy added a comment - After discussing things over with Owen I think we should take this opportunity to atleast make the Context Objects pure abstract classes if not go the full hog and make them interfaces. This will help keep implementation details out of the way... thoughts?
          Hide
          Arun C Murthy added a comment -

          It's 'private' annotation... not java 'private'.

          Show
          Arun C Murthy added a comment - It's 'private' annotation... not java 'private'.
          Hide
          Amareshwari Sriramadasu added a comment -

          For MAPREDUCE-372, we would like to override the functionality of RecordReader, RecordWriter and the configuration of the context. Problem with current Context objects is that Mapper.Context or Reducer.Context is an inner class which is not static. To construct the customized Context object(with the customized RR,RW, conf), we would be forced to expose some implementation details from the framework created context implementations.

          {Job|TaskAttempt|TaskInputOutput|Map|Reduce}ContextImpl - private

          -1. If *ContextImpl is private, overriding some functionaly is not possible. I think *ContextImpl classes should be public evolving.

          Show
          Amareshwari Sriramadasu added a comment - For MAPREDUCE-372 , we would like to override the functionality of RecordReader, RecordWriter and the configuration of the context. Problem with current Context objects is that Mapper.Context or Reducer.Context is an inner class which is not static. To construct the customized Context object(with the customized RR,RW, conf), we would be forced to expose some implementation details from the framework created context implementations. {Job|TaskAttempt|TaskInputOutput|Map|Reduce}ContextImpl - private -1. If *ContextImpl is private, overriding some functionaly is not possible. I think *ContextImpl classes should be public evolving.
          Hide
          Owen O'Malley added a comment -

          I'd be interested to hear more about what the problem was here. Is it in intrinsic problem caused by using abstract classes, or could we improve this by using better encapsulation?

          The problem came about because of the generics and the way we set it up. Basically, if we do:

          public abstract class TaskContext<K1,V1,K2,V2> {
              public abstract K1 getKey();
            ...
          }
          public class Mapper<K1,V1,K2,V2> {
            public class Context extends TaskContext<K1,V1,K2,V2> {
              public Context(.... a lot of implementation details ... ) { ... details ... }
              public K1 getKey() { ... details ... }
            }
          }
          

          Since Mapper.Context is an inner class, it has to be concrete. Clearly, my goal for this jira was to remove the implementation details. After playing with prototypes, I'd suggest we back off of interfaces and make it look like:

          public abstract class TaskContext<K1,V1,K2,V2> {
            ...
          }
          public class Mapper<K1,V1,K2,V2> {
            public abstract class Context extends TaskContext<K1,V1,K2,V2> {
            }
          }
          

          and then the implementation can be done as:

          class MapperImpl<K1,V1,K2,V2> extends Mapper<K1,V1,K2,V2> {
            class ContextImpl implements Context {
              ... details ...
            }
          }
          
          Show
          Owen O'Malley added a comment - I'd be interested to hear more about what the problem was here. Is it in intrinsic problem caused by using abstract classes, or could we improve this by using better encapsulation? The problem came about because of the generics and the way we set it up. Basically, if we do: public abstract class TaskContext<K1,V1,K2,V2> { public abstract K1 getKey(); ... } public class Mapper<K1,V1,K2,V2> { public class Context extends TaskContext<K1,V1,K2,V2> { public Context(.... a lot of implementation details ... ) { ... details ... } public K1 getKey() { ... details ... } } } Since Mapper.Context is an inner class, it has to be concrete. Clearly, my goal for this jira was to remove the implementation details. After playing with prototypes, I'd suggest we back off of interfaces and make it look like: public abstract class TaskContext<K1,V1,K2,V2> { ... } public class Mapper<K1,V1,K2,V2> { public abstract class Context extends TaskContext<K1,V1,K2,V2> { } } and then the implementation can be done as: class MapperImpl<K1,V1,K2,V2> extends Mapper<K1,V1,K2,V2> { class ContextImpl implements Context { ... details ... } }
          Hide
          Doug Cutting added a comment -

          > to protect users who need to implement custom context objects against changes to the interfaces

          I'm confused.

          Interfaces differ from abstract classes in two ways:

          1. Abstract classes can have (default) method implementations. This permits APIs to evolve without breaking callers or implementors.
          2. Multiple interfaces can be inherited.

          The primary advantage of interfaces is the second. But if implementations extend a base class this advantage is voided. What then is the utility of the interface?

          Show
          Doug Cutting added a comment - > to protect users who need to implement custom context objects against changes to the interfaces I'm confused. Interfaces differ from abstract classes in two ways: Abstract classes can have (default) method implementations. This permits APIs to evolve without breaking callers or implementors. Multiple interfaces can be inherited. The primary advantage of interfaces is the second. But if implementations extend a base class this advantage is voided. What then is the utility of the interface?
          Hide
          Tom White added a comment -

          > It caused a lot of information leakage into the public classes.

          I'd be interested to hear more about what the problem was here. Is it in intrinsic problem caused by using abstract classes, or could we improve this by using better encapsulation?

          I worry about how we evolve the API, since we can't add methods to public stable interfaces, whereas we can for public stable abstract classes. Do you have a suggested workaround for this?

          Show
          Tom White added a comment - > It caused a lot of information leakage into the public classes. I'd be interested to hear more about what the problem was here. Is it in intrinsic problem caused by using abstract classes, or could we improve this by using better encapsulation? I worry about how we evolve the API, since we can't add methods to public stable interfaces, whereas we can for public stable abstract classes. Do you have a suggested workaround for this?
          Hide
          Arun C Murthy added a comment -

          Furthermore, Owen/I discussed that it would be appropriate to provide a forwarding 'base' end-user facing implementations of each of the interfaces or abstract base classes which implement the interfaces to protect users who need to implement custom context objects against changes to the interfaces.

          So, in-effect we will have

          {Job|TaskAttempt|TaskInputOutput|Map|Reduce}Context - public stable{Job|TaskAttempt|TaskInputOutput|Map|Reduce}

          ContextImpl - private

          and

          abstract classes: Abstract

          {Job|TaskAttempt|TaskInputOutput|Map|Reduce}Context - public stable
          or
          concrete classes: Forwarding{Job|TaskAttempt|TaskInputOutput|Map|Reduce}

          Context - public stable

          Thoughts?

          Show
          Arun C Murthy added a comment - Furthermore, Owen/I discussed that it would be appropriate to provide a forwarding 'base' end-user facing implementations of each of the interfaces or abstract base classes which implement the interfaces to protect users who need to implement custom context objects against changes to the interfaces. So, in-effect we will have {Job|TaskAttempt|TaskInputOutput|Map|Reduce}Context - public stable{Job|TaskAttempt|TaskInputOutput|Map|Reduce} ContextImpl - private and abstract classes: Abstract {Job|TaskAttempt|TaskInputOutput|Map|Reduce}Context - public stable or concrete classes: Forwarding{Job|TaskAttempt|TaskInputOutput|Map|Reduce} Context - public stable Thoughts?
          Hide
          Arun C Murthy added a comment -

          Initial cut for early preview. src/java compiles, onwards towards src/test etc.

          Show
          Arun C Murthy added a comment - Initial cut for early preview. src/java compiles, onwards towards src/test etc.
          Hide
          Arun C Murthy added a comment -

          +1

          We should bite the bullet and fix it sooner rather than later. I've run into issues with this in MAPREDUCE-901 etc.

          Show
          Arun C Murthy added a comment - +1 We should bite the bullet and fix it sooner rather than later. I've run into issues with this in MAPREDUCE-901 etc.

            People

            • Assignee:
              Arun C Murthy
              Reporter:
              Owen O'Malley
            • Votes:
              0 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development