Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.8.0
    • Labels:

      Description

      The new MapReduce API changed in 0.21 (MAPREDUCE-954) in such a way as to break MRUnit.

      1. MRUNIT-31.patch
        25 kB
        Tom White
      2. MRUNIT-31.patch
        23 kB
        Brock Noland
      3. MRUNIT-31.patch
        19 kB
        Brock Noland
      4. MRUNIT-31.patch
        11 kB
        Tom White
      5. MRUNIT-31.patch
        1 kB
        Tom White

        Issue Links

          Activity

          Hide
          Tom White added a comment -

          This patch illustrates the problem by updating the Hadoop version to 0.23. (Try "mvn clean install -DskipTests".)

          The problem is that org.apache.hadoop.mrunit.mapreduce.mock.MockMapContextWrapper and org.apache.hadoop.mrunit.mapreduce.mock.MockReduceContextWrapper no longer compile since the context objects were changed to interfaces.

          The doesn't affect the public MRUnit API (MapDriver etc), so it should be possible to support both APIs and switch at runtime using Java reflection. Another approach would be to publish two artifacts and have the user pick the one they want to use.

          Thoughts?

          Show
          Tom White added a comment - This patch illustrates the problem by updating the Hadoop version to 0.23. (Try "mvn clean install -DskipTests".) The problem is that org.apache.hadoop.mrunit.mapreduce.mock.MockMapContextWrapper and org.apache.hadoop.mrunit.mapreduce.mock.MockReduceContextWrapper no longer compile since the context objects were changed to interfaces. The doesn't affect the public MRUnit API (MapDriver etc), so it should be possible to support both APIs and switch at runtime using Java reflection. Another approach would be to publish two artifacts and have the user pick the one they want to use. Thoughts?
          Hide
          Brock Noland added a comment -

          Since it's distinguishable at runtime, I'd vote for that method over two artifacts.

          Show
          Brock Noland added a comment - Since it's distinguishable at runtime, I'd vote for that method over two artifacts.
          Hide
          Tom White added a comment -

          Here's a patch that passes tests against Hadoop 0.23. It shouldn't be committed though, since it doesn't work with 0.20.

          The problem is that Mapper.Context has a different constructor in 0.20 to 0.23 - so it's not possible to write a subclass that works in both. The simplest reflective solution would be to use a mock object library. CGLIB would work too.

          Show
          Tom White added a comment - Here's a patch that passes tests against Hadoop 0.23. It shouldn't be committed though, since it doesn't work with 0.20. The problem is that Mapper.Context has a different constructor in 0.20 to 0.23 - so it's not possible to write a subclass that works in both. The simplest reflective solution would be to use a mock object library. CGLIB would work too.
          Hide
          Brock Noland added a comment -

          If we used a mock library we would just mock Mapper.Context (and reducer) in both cases?

            @SuppressWarnings({ "unchecked", "rawtypes" })
            public static <KEYIN, VALUEIN, KEYOUT, VALUEOUT> Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT>.Context 
            create(final List<Pair<KEYIN, VALUEIN>> in, final Counters counters, Configuration conf) {
              Class<org.apache.hadoop.mapreduce.Mapper.Context> mapContext = org.apache.hadoop.mapreduce.Mapper.Context.class;    
              Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT>.Context context = (Mapper.Context) mock(mapContext);
              when(context.getCounter((Enum)any())).thenAnswer(new Answer<Counter>() {
                @Override
                public Counter answer(InvocationOnMock invocation) {
                  Object[] args = invocation.getArguments();
                  // should use guava preconditions
                  return counters.findCounter((Enum)args[0]);
                }
              });
              return context;
            }
          
          Show
          Brock Noland added a comment - If we used a mock library we would just mock Mapper.Context (and reducer) in both cases? @SuppressWarnings({ "unchecked" , "rawtypes" }) public static <KEYIN, VALUEIN, KEYOUT, VALUEOUT> Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT>.Context create( final List<Pair<KEYIN, VALUEIN>> in, final Counters counters, Configuration conf) { Class <org.apache.hadoop.mapreduce.Mapper.Context> mapContext = org.apache.hadoop.mapreduce.Mapper.Context.class; Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT>.Context context = (Mapper.Context) mock(mapContext); when(context.getCounter((Enum)any())).thenAnswer( new Answer<Counter>() { @Override public Counter answer(InvocationOnMock invocation) { Object [] args = invocation.getArguments(); // should use guava preconditions return counters.findCounter((Enum)args[0]); } }); return context; }
          Hide
          Tom White added a comment -

          Yes.

          Show
          Tom White added a comment - Yes.
          Hide
          Brock Noland added a comment - - edited

          I have a version of the code using Mockito which is working with 0.20. It compiles against 0.23 but the tests fail as there is an ugly cast in o.a.h.mapreduce.Reducer:

          // If a back up store is used, reset it
          ((ReduceContext.ValueIterator)
          (context.getValues().iterator())).resetBackupStore();

          That should really being using instanceof. Looks like it was introduced in 0.21 or in another 0.20 branch.

          Show
          Brock Noland added a comment - - edited I have a version of the code using Mockito which is working with 0.20. It compiles against 0.23 but the tests fail as there is an ugly cast in o.a.h.mapreduce.Reducer: // If a back up store is used, reset it ((ReduceContext.ValueIterator) (context.getValues().iterator())).resetBackupStore(); That should really being using instanceof. Looks like it was introduced in 0.21 or in another 0.20 branch.
          Hide
          Brock Noland added a comment -

          Attached is said patch, not meant for commit, it changes the pom to 0.23 and adds mockito/guava as deps.

          Show
          Brock Noland added a comment - Attached is said patch, not meant for commit, it changes the pom to 0.23 and adds mockito/guava as deps.
          Hide
          Tom White added a comment -

          Nice work! I think you forgot to add MockContextWrapper to the patch.

          Regarding the cast failure, can we mock out the call to context.getValues().iterator() to return an instance of the expected type? (We should file and fix this issue upstream too though.)

          Show
          Tom White added a comment - Nice work! I think you forgot to add MockContextWrapper to the patch. Regarding the cast failure, can we mock out the call to context.getValues().iterator() to return an instance of the expected type? (We should file and fix this issue upstream too though.)
          Hide
          Brock Noland added a comment -

          Attached is the correct patch.

          I am not sure how to return the correct type as the class Reducer.ValuesIterator is a protected instance member. There maybe a way, but I am not seeing it.

          I'll file a patch upstream.

          Show
          Brock Noland added a comment - Attached is the correct patch. I am not sure how to return the correct type as the class Reducer.ValuesIterator is a protected instance member. There maybe a way, but I am not seeing it. I'll file a patch upstream.
          Hide
          Brock Noland added a comment -

          I created MAPREDUCE-3344 upstream and updated it with a patch.

          Show
          Brock Noland added a comment - I created MAPREDUCE-3344 upstream and updated it with a patch.
          Hide
          Tom White added a comment -

          With MAPREDUCE-3344 committed I can run the tests against 0.24.0-SNAPSHOT (note that the hadoop-mapred-test artifact is not needed). It's probably worth having multiple profiles for the different Hadoop builds, like MAHOUT-822 does.

          Show
          Tom White added a comment - With MAPREDUCE-3344 committed I can run the tests against 0.24.0-SNAPSHOT (note that the hadoop-mapred-test artifact is not needed). It's probably worth having multiple profiles for the different Hadoop builds, like MAHOUT-822 does.
          Hide
          Tom White added a comment -

          Here's an updated patch with Maven profiles to support both 0.20 and 0.23 builds. The default is 0.20.2, but you can use another version by setting hadoop.version. E.g.

          mvn clean install -Dhadoop.version=0.24.0-SNAPSHOT
          
          Show
          Tom White added a comment - Here's an updated patch with Maven profiles to support both 0.20 and 0.23 builds. The default is 0.20.2, but you can use another version by setting hadoop.version . E.g. mvn clean install -Dhadoop.version=0.24.0-SNAPSHOT
          Hide
          Brock Noland added a comment -

          Committed as 1203354 to trunk.

          Show
          Brock Noland added a comment - Committed as 1203354 to trunk.

            People

            • Assignee:
              Brock Noland
              Reporter:
              Tom White
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development