MRUnit
  1. MRUnit
  2. MRUNIT-70

copy(orig, conf) in Serialization shouldn't require objects to have a no-args constructor, and copy(orig, copy, conf) seems to violate contract for deserializer.deserialize()

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.9.0
    • Labels:
      None

      Description

      The copy(orig, conf) method requires objects to have a no-args constructor, which is non-ideal, since it passes on to Deserializer.deserialize, which can create a new object if passed in null. Additionally, the contract for deserialize only can fill in the object passed in, so the copy method will not perform as expected if it does not.

      http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/serializer/Deserializer.html#deserialize%28T%29

        Activity

        Hide
        Brock Noland added a comment -

        Writables must have a default constructor because they are created via reflection. WritableSerialization, the class, does the same thing we do via ReflectionUtils. Are you not using Writables?

        https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1/src/core/org/apache/hadoop/io/serializer/

        Show
        Brock Noland added a comment - Writables must have a default constructor because they are created via reflection. WritableSerialization, the class, does the same thing we do via ReflectionUtils. Are you not using Writables? https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1/src/core/org/apache/hadoop/io/serializer/
        Hide
        Brock Noland added a comment -
        Show
        Brock Noland added a comment - Yes I do think this is a bug. I am guessing you are using JavaSerialization? https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1/src/core/org/apache/hadoop/io/serializer/JavaSerialization.java
        Hide
        Meloss Xeloss added a comment -

        Implemented a Serialization and passing it it via "io.serializations" in a Configuration, and as you correctly inferred, I'm not using a Writable.

        Show
        Meloss Xeloss added a comment - Implemented a Serialization and passing it it via "io.serializations" in a Configuration, and as you correctly inferred, I'm not using a Writable.
        Hide
        Brock Noland added a comment -

        OK, looks like a fairly easy fix. We basically shouldn't be calling ReflectionUtils and should return whatever deserialize returns. If anyone creates a patch, please post it. Otherwise I should have time tomorrow.

        Show
        Brock Noland added a comment - OK, looks like a fairly easy fix. We basically shouldn't be calling ReflectionUtils and should return whatever deserialize returns. If anyone creates a patch, please post it. Otherwise I should have time tomorrow.
        Hide
        Goir Riog added a comment -

        We're using Hadoop with Cassandra.
        This requires to use ByteBuffer objects for the key as return values from the Reducer. The ByteBuffer doesnt have a empty Constructor so we get the following exception.

        java.lang.RuntimeException: java.lang.NoSuchMethodException: java.nio.HeapByteBuffer.<init>()
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
        at org.apache.hadoop.mrunit.mock.MockOutputCollector.getInstance(MockOutputCollector.java:62)
        at org.apache.hadoop.mrunit.mock.MockOutputCollector.deepCopy(MockOutputCollector.java:73)
        at org.apache.hadoop.mrunit.mock.MockOutputCollector.collect(MockOutputCollector.java:110)
        at org.apache.hadoop.mrunit.mapreduce.mock.MockReduceContextWrapper$MockReduceContext.write(MockReduceContextWrapper.java:204)
        [...]

        is there any patch yet ?

        Show
        Goir Riog added a comment - We're using Hadoop with Cassandra. This requires to use ByteBuffer objects for the key as return values from the Reducer. The ByteBuffer doesnt have a empty Constructor so we get the following exception. java.lang.RuntimeException: java.lang.NoSuchMethodException: java.nio.HeapByteBuffer.<init>() at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115) at org.apache.hadoop.mrunit.mock.MockOutputCollector.getInstance(MockOutputCollector.java:62) at org.apache.hadoop.mrunit.mock.MockOutputCollector.deepCopy(MockOutputCollector.java:73) at org.apache.hadoop.mrunit.mock.MockOutputCollector.collect(MockOutputCollector.java:110) at org.apache.hadoop.mrunit.mapreduce.mock.MockReduceContextWrapper$MockReduceContext.write(MockReduceContextWrapper.java:204) [...] is there any patch yet ?
        Hide
        Jim Donofrio added a comment - - edited

        I uploaded a patch, is this what you had in mind Brock?

        If so I will commit this later

        Show
        Jim Donofrio added a comment - - edited I uploaded a patch, is this what you had in mind Brock? If so I will commit this later
        Hide
        Goir Riog added a comment -

        I'm getting
        12/03/09 15:39:57 INFO junit.TestRunner: java.lang.NullPointerException
        at org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)
        at org.apache.hadoop.mrunit.Serialization.copy(Serialization.java:42)
        at org.apache.hadoop.mrunit.Serialization.copy(Serialization.java:61)
        at org.apache.hadoop.mrunit.mapreduce.mock.MockContextWrapper$4.answer(MockContextWrapper.java:74)
        at org.mockito.internal.stubbing.StubbedInvocationMatcher.answer(StubbedInvocationMatcher.java:29)
        at org.mockito.internal.MockHandler.handle(MockHandler.java:95)
        at org.mockito.internal.creation.MethodInterceptorFilter.intercept(MethodInterceptorFilter.java:47)
        at org.apache.hadoop.mapreduce.Reducer$Context$$EnhancerByMockitoWithCGLIB$$2bef43d9.write(<generated>)

        since copy is still "null" on line 51 in Serialization.java ?!
        But i have no idea how to fix it.

        Show
        Goir Riog added a comment - I'm getting 12/03/09 15:39:57 INFO junit.TestRunner: java.lang.NullPointerException at org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73) at org.apache.hadoop.mrunit.Serialization.copy(Serialization.java:42) at org.apache.hadoop.mrunit.Serialization.copy(Serialization.java:61) at org.apache.hadoop.mrunit.mapreduce.mock.MockContextWrapper$4.answer(MockContextWrapper.java:74) at org.mockito.internal.stubbing.StubbedInvocationMatcher.answer(StubbedInvocationMatcher.java:29) at org.mockito.internal.MockHandler.handle(MockHandler.java:95) at org.mockito.internal.creation.MethodInterceptorFilter.intercept(MethodInterceptorFilter.java:47) at org.apache.hadoop.mapreduce.Reducer$Context$$EnhancerByMockitoWithCGLIB$$2bef43d9.write(<generated>) since copy is still "null" on line 51 in Serialization.java ?! But i have no idea how to fix it.
        Hide
        Goir Riog added a comment -

        I've changes you test to use a ByteBuffer. Which throws the NullPointerException.

        public void testClassWithoutNoArgConstructor()

        { Configuration conf = new Configuration(); conf.setStrings("io.serializations", "org.apache.hadoop.io.serializer.JavaSerialization"); ByteBuffer bb1 = ByteBuffer.allocate(1); bb1.put((byte)0xFF); ByteBuffer bb2 = ByteBuffer.allocate(1); bb2.put((byte)0xFF); assertEquals(bb1, Serialization.copy(bb2, conf)); }
        Show
        Goir Riog added a comment - I've changes you test to use a ByteBuffer. Which throws the NullPointerException. public void testClassWithoutNoArgConstructor() { Configuration conf = new Configuration(); conf.setStrings("io.serializations", "org.apache.hadoop.io.serializer.JavaSerialization"); ByteBuffer bb1 = ByteBuffer.allocate(1); bb1.put((byte)0xFF); ByteBuffer bb2 = ByteBuffer.allocate(1); bb2.put((byte)0xFF); assertEquals(bb1, Serialization.copy(bb2, conf)); }
        Hide
        Jim Donofrio added a comment - - edited

        I am not familiar with Cassandra at all but you need to specify a different serialization since ByteBuffer does not implement Serializable. I think the patch is correct but I dont understand the other comment "the contract for deserialize only can fill in the object passed in, so the copy method will not perform as expected if it does not"

        Show
        Jim Donofrio added a comment - - edited I am not familiar with Cassandra at all but you need to specify a different serialization since ByteBuffer does not implement Serializable. I think the patch is correct but I dont understand the other comment "the contract for deserialize only can fill in the object passed in, so the copy method will not perform as expected if it does not"
        Hide
        Jim Donofrio added a comment -

        Ok I get it now, the contract for deserialize states that "If the object t is non-null then this deserializer may set its internal state to the next object read from the input stream. " So the deserialize method may still return a new object instead of changing the state if the object passed in. WritableSerialization changes the state of the non null object passed in. JavaSerialization ignores the object passed in and always return a new object.

        public T deserialize(T object) throws IOException {
        try

        { // ignore passed-in object return (T) ois.readObject(); }

        catch (ClassNotFoundException e)

        { throw new IOException(e.toString()); }

        }

        Show
        Jim Donofrio added a comment - Ok I get it now, the contract for deserialize states that "If the object t is non-null then this deserializer may set its internal state to the next object read from the input stream. " So the deserialize method may still return a new object instead of changing the state if the object passed in. WritableSerialization changes the state of the non null object passed in. JavaSerialization ignores the object passed in and always return a new object. public T deserialize(T object) throws IOException { try { // ignore passed-in object return (T) ois.readObject(); } catch (ClassNotFoundException e) { throw new IOException(e.toString()); } }
        Hide
        Jim Donofrio added a comment -

        added javadoc on Serialization class methods
        change copy(orig, copy, conf) to allow null for copy which means to create a new object
        change copy(orig, conf) to not use ReflectionUtils but to pass in null to copy(orig, copy, conf)
        acknowledge in javadoc that depending on the serialization class, copy may or may not copy orig into copy
        return the result of deserialize since depending on serialization class, copy may or may not copy orig into copy

        committed in 1299150

        Show
        Jim Donofrio added a comment - added javadoc on Serialization class methods change copy(orig, copy, conf) to allow null for copy which means to create a new object change copy(orig, conf) to not use ReflectionUtils but to pass in null to copy(orig, copy, conf) acknowledge in javadoc that depending on the serialization class, copy may or may not copy orig into copy return the result of deserialize since depending on serialization class, copy may or may not copy orig into copy committed in 1299150
        Hide
        Jim Donofrio added a comment -

        I dont consider the failure of ByteBuffer to be mrunit's problem, there should be a serialization class that you can set in io.serializations to get ByteBuffer to work

        Show
        Jim Donofrio added a comment - I dont consider the failure of ByteBuffer to be mrunit's problem, there should be a serialization class that you can set in io.serializations to get ByteBuffer to work
        Hide
        Brock Noland added a comment -

        LGTM! Thank you Jim!

        Also, yes I agree, I think to serialize a BB you'd need to set io.serializations.

        Show
        Brock Noland added a comment - LGTM! Thank you Jim! Also, yes I agree, I think to serialize a BB you'd need to set io.serializations.
        Hide
        Tarun Adiwal added a comment -

        Hi Jim,

        I have just upgrade from mrunit-0.8.0 to mrunit 0.9.0. I have following code in my reducer:

        context.write( null, new Text( key.toString().trim() +"_" + value.toString() ) );

        when I run my test the above code throws NullPointerException. The full stacktrace is given below :
        java.lang.NullPointerException
        at org.apache.hadoop.mrunit.Serialization.copy(Serialization.java:52)
        at org.apache.hadoop.mrunit.Serialization.copy(Serialization.java:81)
        at org.apache.hadoop.mrunit.mapreduce.mock.MockContextWrapper$4.answer(MockContextWrapper.java:78)
        at org.mockito.internal.stubbing.StubbedInvocationMatcher.answer(StubbedInvocationMatcher.java:31)
        at org.mockito.internal.MockHandler.handle(MockHandler.java:97)
        at org.mockito.internal.creation.MethodInterceptorFilter.intercept(MethodInterceptorFilter.java:47)
        at org.apache.hadoop.mapreduce.Reducer$Context$$EnhancerByMockitoWithCGLIB$$fd46c1a7.write(<generated>)

        As you said –
        change copy(orig, copy, conf) to allow null for copy which means to create a new object
        But I think when I am passing null to the context.write() method it is not creating a new object but giving NullPointerException at org.apache.hadoop.mrunit.Serialization.copy(Serialization.java:52).
        Can you please suggest.

        Thanks,
        Tarun

        Show
        Tarun Adiwal added a comment - Hi Jim, I have just upgrade from mrunit-0.8.0 to mrunit 0.9.0. I have following code in my reducer: context.write( null, new Text( key.toString().trim() +"_" + value.toString() ) ); when I run my test the above code throws NullPointerException. The full stacktrace is given below : java.lang.NullPointerException at org.apache.hadoop.mrunit.Serialization.copy(Serialization.java:52) at org.apache.hadoop.mrunit.Serialization.copy(Serialization.java:81) at org.apache.hadoop.mrunit.mapreduce.mock.MockContextWrapper$4.answer(MockContextWrapper.java:78) at org.mockito.internal.stubbing.StubbedInvocationMatcher.answer(StubbedInvocationMatcher.java:31) at org.mockito.internal.MockHandler.handle(MockHandler.java:97) at org.mockito.internal.creation.MethodInterceptorFilter.intercept(MethodInterceptorFilter.java:47) at org.apache.hadoop.mapreduce.Reducer$Context$$EnhancerByMockitoWithCGLIB$$fd46c1a7.write(<generated>) As you said – change copy(orig, copy, conf) to allow null for copy which means to create a new object But I think when I am passing null to the context.write() method it is not creating a new object but giving NullPointerException at org.apache.hadoop.mrunit.Serialization.copy(Serialization.java:52). Can you please suggest. Thanks, Tarun
        Hide
        Jim Donofrio added a comment -

        Yes this is a bug, I did not think about the case where you want to pass null as a key or value which some output formats such as TextOutputFormat allow. Please open a JIRA for this problem, thanks for reporting this

        Show
        Jim Donofrio added a comment - Yes this is a bug, I did not think about the case where you want to pass null as a key or value which some output formats such as TextOutputFormat allow. Please open a JIRA for this problem, thanks for reporting this

          People

          • Assignee:
            Jim Donofrio
            Reporter:
            Meloss Xeloss
          • Votes:
            2 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development