MRUnit
  1. MRUnit
  2. MRUNIT-66

null input checks and behavior on no input to a driver are inconsistent

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.8.1
    • Fix Version/s: 0.9.0
    • Labels:
      None

      Description

      the MapDriver class does not allow null input values if using setInput(Pair) but does allow null input if using setInputKey, setInputValue, or setInput(key, value)

      Also the MapDriver, ReduceDriver classes will throw null pointer exceptions with no input while the MapReduceDriver and Pipeline classes will just log warnings

        Activity

        Hide
        Carlos Espinoza added a comment -

        Thanks Jim. I haven't had to use MapReduceDriver. Only MapDriver or ReduceDriver individually. The null check would be good if there is a way of telling that a key/value is being used as an intermediate key.

        Show
        Carlos Espinoza added a comment - Thanks Jim. I haven't had to use MapReduceDriver. Only MapDriver or ReduceDriver individually. The null check would be good if there is a way of telling that a key/value is being used as an intermediate key.
        Hide
        Jim Donofrio added a comment -

        Yes this was an oversight on my part. The made reason I made this change was to avoid people trying to set mappers to null and then wonder why they get a null pointer later. I will create a new JIRA, MRUNIT-157, and allow null keys and values except when testing keys out of the mapper with the MapReduceDriver, right?

        The bug in the 2nd comment is that we try to create copies of the output to add to the list but dont try to handle null's correctly. I will also address this in MRUNIT-157

        Show
        Jim Donofrio added a comment - Yes this was an oversight on my part. The made reason I made this change was to avoid people trying to set mappers to null and then wonder why they get a null pointer later. I will create a new JIRA, MRUNIT-157 , and allow null keys and values except when testing keys out of the mapper with the MapReduceDriver, right? The bug in the 2nd comment is that we try to create copies of the output to add to the list but dont try to handle null's correctly. I will also address this in MRUNIT-157
        Hide
        Carlos Espinoza added a comment -

        I've also tried driver.run() as opposed to driver.runTest() so that I can validate the output myself, but it fails with

        java.lang.NullPointerException
        at org.apache.hadoop.mrunit.Serialization.copy(Serialization.java:52)
        at org.apache.hadoop.mrunit.Serialization.copy(Serialization.java:81)
        at org.apache.hadoop.mrunit.mock.MockOutputCollector.deepCopy(MockOutputCollector.java:45)
        at org.apache.hadoop.mrunit.mock.MockOutputCollector.collect(MockOutputCollector.java:54)
        ...

        This is because the mapper returns a null key because it expects that there will be no reducer. Again, Hadoop does not have a problem with this.

        From the API, the run function should behave as follows:
        Runs the test but returns the result set instead of validating it (ignores any addOutput(), etc calls made before this)

        Seems to me like MRUnit should return the output.

        Show
        Carlos Espinoza added a comment - I've also tried driver.run() as opposed to driver.runTest() so that I can validate the output myself, but it fails with java.lang.NullPointerException at org.apache.hadoop.mrunit.Serialization.copy(Serialization.java:52) at org.apache.hadoop.mrunit.Serialization.copy(Serialization.java:81) at org.apache.hadoop.mrunit.mock.MockOutputCollector.deepCopy(MockOutputCollector.java:45) at org.apache.hadoop.mrunit.mock.MockOutputCollector.collect(MockOutputCollector.java:54) ... This is because the mapper returns a null key because it expects that there will be no reducer. Again, Hadoop does not have a problem with this. From the API, the run function should behave as follows: Runs the test but returns the result set instead of validating it (ignores any addOutput(), etc calls made before this) Seems to me like MRUnit should return the output.
        Hide
        Carlos Espinoza added a comment -

        I'm having a problem with this change. Perhaps you can suggest a work around. We are currently upgrading to CDH4 which required us to use MRUnit 0.9.0

        It doesn't seem like this fix reflects what the framework does. Hadoop allows for a null key when it's not being used as an intermediate key. For instance, if the key is null and the value is something else. Then it'll only write the value to the output and it doesn't complain. It only has a problem with it when it is being used as an intermediate key

        Having said that, some of our mapper tests are failing with returnNonNull when they output null as a key even though they are not being used as intermediate keys. The last thing we want to do is change the mapper code, so I'm wondering, are there any work arounds you can suggest? Thanks for your help

        Show
        Carlos Espinoza added a comment - I'm having a problem with this change. Perhaps you can suggest a work around. We are currently upgrading to CDH4 which required us to use MRUnit 0.9.0 It doesn't seem like this fix reflects what the framework does. Hadoop allows for a null key when it's not being used as an intermediate key. For instance, if the key is null and the value is something else. Then it'll only write the value to the output and it doesn't complain. It only has a problem with it when it is being used as an intermediate key Having said that, some of our mapper tests are failing with returnNonNull when they output null as a key even though they are not being used as intermediate keys. The last thing we want to do is change the mapper code, so I'm wondering, are there any work arounds you can suggest? Thanks for your help
        Hide
        Jim Donofrio added a comment -

        added new ArgumentChecker class in internal.util with returnNonNull method which returns the input if not null or throws an exception if null
        also added returnNonNull for lists to check that all entries in a list are not null
        internal package will be for classes that have to be public to be used by the api but should not be relied upon by users, MRUNIT-92 will handle not generating javadoc for internal classes
        added returnNonNull calls to all public api methods
        simplified Pair implementation since null's are no longer allowed, null's dont make sense in Pair since no mapper/reducer will input/output null

        committed in 1305185

        Show
        Jim Donofrio added a comment - added new ArgumentChecker class in internal.util with returnNonNull method which returns the input if not null or throws an exception if null also added returnNonNull for lists to check that all entries in a list are not null internal package will be for classes that have to be public to be used by the api but should not be relied upon by users, MRUNIT-92 will handle not generating javadoc for internal classes added returnNonNull calls to all public api methods simplified Pair implementation since null's are no longer allowed, null's dont make sense in Pair since no mapper/reducer will input/output null committed in 1305185
        Hide
        Jim Donofrio added a comment - - edited

        Is it true that null should never be a legitimate key or value as input output to a mapper or reducer?

        I guess it is possible if someone wrote a custom RecordReader that could pass null to a custom mapper or a RecordWriter that would accept null from a custom mapper or reducer. However, I dont think this would be a good practice, since that RecordReader would be incompatible with any other mapper/reducer and those mappers/reducers would be incompatible with any other RecordWriter.

        I think I would be ok then in not allowing null as input to any of our public methods including the Pair class

        Show
        Jim Donofrio added a comment - - edited Is it true that null should never be a legitimate key or value as input output to a mapper or reducer? I guess it is possible if someone wrote a custom RecordReader that could pass null to a custom mapper or a RecordWriter that would accept null from a custom mapper or reducer. However, I dont think this would be a good practice, since that RecordReader would be incompatible with any other mapper/reducer and those mappers/reducers would be incompatible with any other RecordWriter. I think I would be ok then in not allowing null as input to any of our public methods including the Pair class
        Hide
        Jim Donofrio added a comment -

        throw new IllegalStateException("No input was provided"); at the beginning of the run() methods in the 7 drivers if the input list or key, values are null or empty

        committed in 1304939

        Show
        Jim Donofrio added a comment - throw new IllegalStateException("No input was provided"); at the beginning of the run() methods in the 7 drivers if the input list or key, values are null or empty committed in 1304939
        Hide
        Brock Noland added a comment -

        Sounds good to me!

        Show
        Brock Noland added a comment - Sounds good to me!
        Hide
        Jim Donofrio added a comment -

        http://docs.oracle.com/javase/6/docs/api/java/lang/NullPointerException.html

        Javadoc seems clear that we should throw a NullPointerException for "Applications should throw instances of this class to indicate other illegal uses of the null object." I think I will add a method to check for null and a throw an exception with the passed in argument name everywhere we take an input from the outside.

        I will change the driver classes to all throw IllegalStateException when the user doesnt pass in any input to a driver

        Show
        Jim Donofrio added a comment - http://docs.oracle.com/javase/6/docs/api/java/lang/NullPointerException.html Javadoc seems clear that we should throw a NullPointerException for "Applications should throw instances of this class to indicate other illegal uses of the null object." I think I will add a method to check for null and a throw an exception with the passed in argument name everywhere we take an input from the outside. I will change the driver classes to all throw IllegalStateException when the user doesnt pass in any input to a driver

          People

          • Assignee:
            Jim Donofrio
            Reporter:
            Jim Donofrio
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Due:
              Created:
              Updated:
              Resolved:

              Development