MRUnit
  1. MRUnit
  2. MRUNIT-64

Multiple Input Key, Value Pairs should be supported

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.1
    • Fix Version/s: 1.0.0
    • Labels:
      None

      Description

      The comments on MRUNIT-14 brought up a great point. The API today only allows a single key value pair input. In many scenarios multiple inputs would be useful.

      1. MRUNIT-64.patch
        18 kB
        James Kinley
      2. MRUNIT-64-v1.patch
        31 kB
        James Kinley
      3. MRUNIT-64-v2.patch
        21 kB
        James Kinley

        Issue Links

          Activity

          Hide
          Brock Noland added a comment -

          Committed in 1369090. Thanks for you contribution James!

          Show
          Brock Noland added a comment - Committed in 1369090. Thanks for you contribution James!
          Hide
          Brock Noland added a comment -

          Cool, I will review it tomorrow! Thanks James!

          Show
          Brock Noland added a comment - Cool, I will review it tomorrow! Thanks James!
          Hide
          James Kinley added a comment -

          Hi Bertrand, I uploaded the v2 patch as you added your comment. The v2 patch adds multi input support for mapred, so if the patch is ok both api's will be supported and handled in the same way.

          Show
          James Kinley added a comment - Hi Bertrand, I uploaded the v2 patch as you added your comment. The v2 patch adds multi input support for mapred, so if the patch is ok both api's will be supported and handled in the same way.
          Hide
          James Kinley added a comment -

          Added javadoc comments to deprecated methods, and added multi-input support for the mapred api

          Show
          James Kinley added a comment - Added javadoc comments to deprecated methods, and added multi-input support for the mapred api
          Hide
          Bertrand Dechoux added a comment -

          Did the patch get already applied? I am doing a first patch for MRUNIT-125. As the result the input setters will be the same whether it is the mapred or the mapreduce API. However, the inputs are not checked the same way for both APIs. For me it would mean that multiple inputs are not handled the same way for both APIs. That's why I am asking here.

          Eg MapDriver
          -> mapred

          if (inputKey == null || inputVal == null)

          { throw new IllegalStateException("No input was provided"); }

          -> mapreduce

          if (inputKey != null && inputVal != null) { setInput(inputKey, inputVal); }

          if (inputs == null || inputs.size() == 0) { throw new IllegalStateException("No input was provided"); }
          Show
          Bertrand Dechoux added a comment - Did the patch get already applied? I am doing a first patch for MRUNIT-125 . As the result the input setters will be the same whether it is the mapred or the mapreduce API. However, the inputs are not checked the same way for both APIs. For me it would mean that multiple inputs are not handled the same way for both APIs. That's why I am asking here. Eg MapDriver -> mapred if (inputKey == null || inputVal == null) { throw new IllegalStateException("No input was provided"); } -> mapreduce if (inputKey != null && inputVal != null) { setInput(inputKey, inputVal); } if (inputs == null || inputs.size() == 0) { throw new IllegalStateException("No input was provided"); }
          Hide
          Jim Donofrio added a comment -

          Yes you are right abut the check in ReduceDriver and then MapDriver only needs to check if inputs.size == 0 since it cannot be null.

          if (inputs.isEmpty())

          { throw new IllegalStateException("No input was provided"); }
          Show
          Jim Donofrio added a comment - Yes you are right abut the check in ReduceDriver and then MapDriver only needs to check if inputs.size == 0 since it cannot be null. if (inputs.isEmpty()) { throw new IllegalStateException("No input was provided"); }
          Hide
          Brock Noland added a comment -

          The next patch, just do it on top of the current trunk (which includes this commit).

          Show
          Brock Noland added a comment - The next patch, just do it on top of the current trunk (which includes this commit).
          Hide
          James Kinley added a comment -

          No problem, I will also add changes to mapred.

          Show
          James Kinley added a comment - No problem, I will also add changes to mapred.
          Hide
          Brock Noland added a comment -

          No changes were made to the mapred API. I suppose we have to fully support the old API until it's re-depreciated. Sorry James, my fault.

          Show
          Brock Noland added a comment - No changes were made to the mapred API. I suppose we have to fully support the old API until it's re-depreciated. Sorry James, my fault.
          Hide
          Dave Beech added a comment -

          (I replied to this earlier on the dev list rather than JIRA, so apologies for duplicate comment!)

          I'm still using the old API for most of my mapreduce jobs - we used to use Apache Hadoop 0.20.1 and the new API was embarrassingly incomplete and buggy. I'm aware that a lot of the missing features (MultipleInputs etc) have since been added in later versions but I've stuck with the old because I trust it more.

          Until the old API is killed off by Hadoop - and I don't mean the half-hearted deprecation that was reverted in the 0.21 branch - I think we should continue to fully support it on mrunit too.

          Show
          Dave Beech added a comment - (I replied to this earlier on the dev list rather than JIRA, so apologies for duplicate comment!) – I'm still using the old API for most of my mapreduce jobs - we used to use Apache Hadoop 0.20.1 and the new API was embarrassingly incomplete and buggy. I'm aware that a lot of the missing features (MultipleInputs etc) have since been added in later versions but I've stuck with the old because I trust it more. Until the old API is killed off by Hadoop - and I don't mean the half-hearted deprecation that was reverted in the 0.21 branch - I think we should continue to fully support it on mrunit too.
          Hide
          James Kinley added a comment -

          Agree on the javadoc, I will add this.

          I'd hope that the majority are using the new api by now, and going forward mapred will be phased out, so didn't think it was necessary to add it to the old api too.

          inputs is initialised in the base class so should never be null. I have added the check to the MapDriver so will also add it here for consistency.

          Thanks for the feedback.

          Show
          James Kinley added a comment - Agree on the javadoc, I will add this. I'd hope that the majority are using the new api by now, and going forward mapred will be phased out, so didn't think it was necessary to add it to the old api too. inputs is initialised in the base class so should never be null. I have added the check to the MapDriver so will also add it here for consistency. Thanks for the feedback.
          Hide
          Jim Donofrio added a comment -

          I should have commented earlier but for all the deprecations you should put in the javadoc why and what the user should use instead.

          Why were the fluent methods only added to the mapreduce api?

          Also why werent the same changes made to the mapred run methods as for mapreduce?

          Why in ReduceDriver does this not check for inputKey == null:
          if (inputs.isEmpty())

          { throw new IllegalStateException("No input was provided"); }
          Show
          Jim Donofrio added a comment - I should have commented earlier but for all the deprecations you should put in the javadoc why and what the user should use instead. Why were the fluent methods only added to the mapreduce api? Also why werent the same changes made to the mapred run methods as for mapreduce? Why in ReduceDriver does this not check for inputKey == null: if (inputs.isEmpty()) { throw new IllegalStateException("No input was provided"); }
          Hide
          Brock Noland added a comment -

          OK, nice work! I have ran all the unit tests with both hadoop 1 and hadoop 2, they all pass and my worries about backwards incompatibility appear to be unfounded. I have committed this in 1366869!

          Thank you for your contribution James!

          Show
          Brock Noland added a comment - OK, nice work! I have ran all the unit tests with both hadoop 1 and hadoop 2, they all pass and my worries about backwards incompatibility appear to be unfounded. I have committed this in 1366869! Thank you for your contribution James!
          Hide
          James Kinley added a comment -

          Hi Brock,

          The addInput() methods for the MapDriver and ReduceDriver are added by this patch, so I don't think these tests are valid in the current API. The MapDriver and ReduceDriver have setInput(), setInputKey(), and setInputValue() only, which I have deprecated.

          The MapReduceDriver does have withInput() / addInput() but I've not changed how this works, just supplemented them with withAll() / addAll() and allowed the entire list to be passed to the mapper / reducer (so it doesn't have to create a new instance for each record).

          James.

          Show
          James Kinley added a comment - Hi Brock, The addInput() methods for the MapDriver and ReduceDriver are added by this patch, so I don't think these tests are valid in the current API. The MapDriver and ReduceDriver have setInput(), setInputKey(), and setInputValue() only, which I have deprecated. The MapReduceDriver does have withInput() / addInput() but I've not changed how this works, just supplemented them with withAll() / addAll() and allowed the entire list to be passed to the mapper / reducer (so it doesn't have to create a new instance for each record). James.
          Hide
          Brock Noland added a comment -

          Looks great! Just one item left... Below are some tests I think we should add and get passing. They are valid uses of the API prior to this change. I think the root of the problem is the addInput API behaves differently. Perhaps we should preserve it's behavior and then deprecate all APIs which do not conform to the new *all methods?

          TestMapDriver

            @Test
            public void testWithBehaviorPriorToMRUNIT64() throws IOException {
              driver.withInput(new Text("foo"), new Text("bar"));
              driver.withOutput(new Text("foo"), new Text("bar"));
              driver.resetOutput();
              driver.withInput(new Text("foo"), new Text("bar"));
              driver.withOutput(new Text("foo"), new Text("bar"));
              driver.runTest();
            }
            @Test
            public void testAddBehaviorPriorToMRUNIT64() throws IOException {
              driver.addInput(new Text("foo"), new Text("bar"));
              driver.addOutput(new Text("foo"), new Text("bar"));
              driver.resetOutput();
              driver.addInput(new Text("foo"), new Text("bar"));
              driver.addOutput(new Text("foo"), new Text("bar"));
              driver.runTest();
            }
          

          Test ReduceDriver

            @Test
            public void testWithBehaviorPriorToMRUNIT64() throws IOException {
              List<LongWritable> values = new ArrayList<LongWritable>();
              values.add(new LongWritable(1));
              values.add(new LongWritable(2));
              values.add(new LongWritable(3));
              driver.withInput(new Pair<Text, List<LongWritable>>(new Text("foo"), values));
              driver.withOutput(new Text("foo"), new LongWritable(6));
              driver.runTest();
              
              driver.resetOutput();
              driver.withInput(new Pair<Text, List<LongWritable>>(new Text("foo"), values));
              driver.withOutput(new Text("foo"), new LongWritable(6));
              driver.runTest();
            }
            @Test
            public void testAddBehaviorPriorToMRUNIT64() throws IOException {
              List<LongWritable> values = new ArrayList<LongWritable>();
              values.add(new LongWritable(1));
              values.add(new LongWritable(2));
              values.add(new LongWritable(3));
              driver.addInput(new Pair<Text, List<LongWritable>>(new Text("foo"), values));
              driver.addOutput(new Text("foo"), new LongWritable(6));
              driver.runTest();
              
              driver.resetOutput();
              driver.addInput(new Pair<Text, List<LongWritable>>(new Text("foo"), values));
              driver.addOutput(new Text("foo"), new LongWritable(6));
              driver.runTest();
            }
          
          Show
          Brock Noland added a comment - Looks great! Just one item left... Below are some tests I think we should add and get passing. They are valid uses of the API prior to this change. I think the root of the problem is the addInput API behaves differently. Perhaps we should preserve it's behavior and then deprecate all APIs which do not conform to the new *all methods? TestMapDriver @Test public void testWithBehaviorPriorToMRUNIT64() throws IOException { driver.withInput( new Text( "foo" ), new Text( "bar" )); driver.withOutput( new Text( "foo" ), new Text( "bar" )); driver.resetOutput(); driver.withInput( new Text( "foo" ), new Text( "bar" )); driver.withOutput( new Text( "foo" ), new Text( "bar" )); driver.runTest(); } @Test public void testAddBehaviorPriorToMRUNIT64() throws IOException { driver.addInput( new Text( "foo" ), new Text( "bar" )); driver.addOutput( new Text( "foo" ), new Text( "bar" )); driver.resetOutput(); driver.addInput( new Text( "foo" ), new Text( "bar" )); driver.addOutput( new Text( "foo" ), new Text( "bar" )); driver.runTest(); } Test ReduceDriver @Test public void testWithBehaviorPriorToMRUNIT64() throws IOException { List<LongWritable> values = new ArrayList<LongWritable>(); values.add( new LongWritable(1)); values.add( new LongWritable(2)); values.add( new LongWritable(3)); driver.withInput( new Pair<Text, List<LongWritable>>( new Text( "foo" ), values)); driver.withOutput( new Text( "foo" ), new LongWritable(6)); driver.runTest(); driver.resetOutput(); driver.withInput( new Pair<Text, List<LongWritable>>( new Text( "foo" ), values)); driver.withOutput( new Text( "foo" ), new LongWritable(6)); driver.runTest(); } @Test public void testAddBehaviorPriorToMRUNIT64() throws IOException { List<LongWritable> values = new ArrayList<LongWritable>(); values.add( new LongWritable(1)); values.add( new LongWritable(2)); values.add( new LongWritable(3)); driver.addInput( new Pair<Text, List<LongWritable>>( new Text( "foo" ), values)); driver.addOutput( new Text( "foo" ), new LongWritable(6)); driver.runTest(); driver.resetOutput(); driver.addInput( new Pair<Text, List<LongWritable>>( new Text( "foo" ), values)); driver.addOutput( new Text( "foo" ), new LongWritable(6)); driver.runTest(); }
          Hide
          James Kinley added a comment -

          Updated patch with additional changes to support multiple inputs (addAll) in the ReduceDriver & ReducePhaseRunner + tests.

          Show
          James Kinley added a comment - Updated patch with additional changes to support multiple inputs (addAll) in the ReduceDriver & ReducePhaseRunner + tests.
          Hide
          James Kinley added a comment -

          Please can you hold back looking at the attached patch? There are some additional changes I'd like to make to the ReducePhaseRunner. I will attach an updated patch asap.

          Show
          James Kinley added a comment - Please can you hold back looking at the attached patch? There are some additional changes I'd like to make to the ReducePhaseRunner. I will attach an updated patch asap.
          Hide
          Jim Donofrio added a comment -

          This is also necessary to test jobs that queue up input and then do the real work in the close/cleanup method

          Show
          Jim Donofrio added a comment - This is also necessary to test jobs that queue up input and then do the real work in the close/cleanup method
          Hide
          Brock Noland added a comment -

          Hi Leroy,

          Did you mean the hadoop link? If so it's here: http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapred/lib/MultipleInputs.html

          Brock

          Show
          Brock Noland added a comment - Hi Leroy, Did you mean the hadoop link? If so it's here: http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/mapred/lib/MultipleInputs.html Brock
          Hide
          Leroy Kendall added a comment -

          Can you please give a link to the JIRA about MultipleInputs class if such does exist.

          Show
          Leroy Kendall added a comment - Can you please give a link to the JIRA about MultipleInputs class if such does exist.
          Hide
          Brock Noland added a comment -

          Good point, this JIRA is not about the MultipleInputs class. I think supporting MultipleInputs, via a separate JIRA, could be good feature.

          Show
          Brock Noland added a comment - Good point, this JIRA is not about the MultipleInputs class. I think supporting MultipleInputs, via a separate JIRA, could be good feature.
          Hide
          Jim Donofrio added a comment -

          And just to clarify by multiple inputs we are talking about giving a mapper or reducer class multiple sets of key, value pairs not using the MultipleInputs class. However, what is your thought on adding support for the MultipleInputs class to the MapReduceDriver. A user would then test their different mappers for multiple inputs with the MapDriver and could then test the combination of the mappers with a reducer in the MapReduceDriver class

          Show
          Jim Donofrio added a comment - And just to clarify by multiple inputs we are talking about giving a mapper or reducer class multiple sets of key, value pairs not using the MultipleInputs class. However, what is your thought on adding support for the MultipleInputs class to the MapReduceDriver. A user would then test their different mappers for multiple inputs with the MapDriver and could then test the combination of the mappers with a reducer in the MapReduceDriver class

            People

            • Assignee:
              James Kinley
              Reporter:
              Brock Noland
            • Votes:
              2 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development