Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Release Note:
      New Avro serialization in .../io/serializer/avro.

      Description

      Support to serialize and deserialize Avro types in Hadoop.

      1. 6120_v6.patch
        28 kB
        Sharad Agarwal
      2. 6120_v5.patch
        28 kB
        Sharad Agarwal
      3. HADOOP-6120.patch
        24 kB
        Doug Cutting
      4. 6120_v4.patch
        24 kB
        Sharad Agarwal
      5. 6120_v3.patch
        18 kB
        Sharad Agarwal
      6. 6120_v2.patch
        17 kB
        Sharad Agarwal
      7. 6120_v1.patch
        4 kB
        Sharad Agarwal

        Issue Links

          Activity

          Hide
          Sharad Agarwal added a comment -

          The Serialization interface can be implemented for Avro types. Currently Avro has 3 types in java: specific, reflect and generic. I think the current Serialization interface is fine to implement 'specific' type as the schema information can be inferred from the class. The schema can't be inferred from the 'reflect' and 'generic' type classes. So the current Serialization interface doesn't seem to be sufficient to implement these. I think the Serialization interface need to be extended to pass on additional info to have implementation for 'generic' and 'reflect'.

          Show
          Sharad Agarwal added a comment - The Serialization interface can be implemented for Avro types. Currently Avro has 3 types in java: specific, reflect and generic. I think the current Serialization interface is fine to implement 'specific' type as the schema information can be inferred from the class. The schema can't be inferred from the 'reflect' and 'generic' type classes. So the current Serialization interface doesn't seem to be sufficient to implement these. I think the Serialization interface need to be extended to pass on additional info to have implementation for 'generic' and 'reflect'.
          Hide
          Sharad Agarwal added a comment -

          An early patch having AvroSpecificSerialization. Test case not added yet. Include avro jar in classpath to build with this patch.
          Avro release is round the corner. Once it is released we can checkin avro jar in lib folder till it is published to maven repos.

          Show
          Sharad Agarwal added a comment - An early patch having AvroSpecificSerialization. Test case not added yet. Include avro jar in classpath to build with this patch. Avro release is round the corner. Once it is released we can checkin avro jar in lib folder till it is published to maven repos.
          Hide
          Philip Zeyliger added a comment -

          This is perhaps more of an AVRO question, but, out of curiosity, is there a way to avoid this cast? Could SpecificDatumReader.read() return SpecificRecord, or does the inheritance model forbid it?

          return (SpecificRecord) reader.read(t, decoder);

          Show
          Philip Zeyliger added a comment - This is perhaps more of an AVRO question, but, out of curiosity, is there a way to avoid this cast? Could SpecificDatumReader.read() return SpecificRecord, or does the inheritance model forbid it? return (SpecificRecord) reader.read(t, decoder);
          Hide
          Doug Cutting added a comment -

          This looks useful. A few comments:

          • Support for generic data will eventually be needed but will require changes to Avro's serializaton API that we should address in a separate issue.
          • I think we can use ReflectData#getSchema() to implement a reflect-based Serializer. The only tricky bit will be the 'accept()' implementation, which might use a configuration property and/or a tag interface (e.g., AvroReflectSerializeable).
          • The call to ReflectionUtils#newInstance() in deserialize is not required.
          • The schema might be created in the constructor rather than in the open() method.
          Show
          Doug Cutting added a comment - This looks useful. A few comments: Support for generic data will eventually be needed but will require changes to Avro's serializaton API that we should address in a separate issue. I think we can use ReflectData#getSchema() to implement a reflect-based Serializer. The only tricky bit will be the 'accept()' implementation, which might use a configuration property and/or a tag interface (e.g., AvroReflectSerializeable). The call to ReflectionUtils#newInstance() in deserialize is not required. The schema might be created in the constructor rather than in the open() method.
          Hide
          Sharad Agarwal added a comment -

          This patch:

          • addresses Doug's concerns
          • moves the common code in abstract base class AvroSerialization
          • implements 'reflect' serialization based on Reflect#getSchema. The accept call is based on two mechanisms:
            tag interface - AvroReflectSerializable.
            package names. Applications can configure the package to be considered as reflect type. Applications then can use it with their existing classes without the need for implementing AvroReflectSerializable interface.
          • adds testcase
          Show
          Sharad Agarwal added a comment - This patch: addresses Doug's concerns moves the common code in abstract base class AvroSerialization implements 'reflect' serialization based on Reflect#getSchema. The accept call is based on two mechanisms: tag interface - AvroReflectSerializable. package names. Applications can configure the package to be considered as reflect type. Applications then can use it with their existing classes without the need for implementing AvroReflectSerializable interface. adds testcase
          Hide
          Doug Cutting added a comment -

          Looks good. Some comments:

          • in AvroSerialization, s/GenericDatum {Reader,Writer}/Datum{Reader,Writer}

            /? All implementations will probably always use a subclass of Generic, but it feels to me like this API should be more abstract, no?

          • in AvroReflectSerialization, 'List<String> packages' might be more efficient as Set<String>?
          • in AvroReflectSerialization, getPackages() could use Configuration#getStrings().
          • does Avro's reflect API actually require that fields are public? Specific always generates public fields, but I think reflect should be able to serialize and deserialize private fields. If not, maybe we should fix that in Avro?
          Show
          Doug Cutting added a comment - Looks good. Some comments: in AvroSerialization, s/GenericDatum {Reader,Writer}/Datum{Reader,Writer} /? All implementations will probably always use a subclass of Generic, but it feels to me like this API should be more abstract, no? in AvroReflectSerialization, 'List<String> packages' might be more efficient as Set<String>? in AvroReflectSerialization, getPackages() could use Configuration#getStrings(). does Avro's reflect API actually require that fields are public? Specific always generates public fields, but I think reflect should be able to serialize and deserialize private fields. If not, maybe we should fix that in Avro?
          Hide
          Sharad Agarwal added a comment -

          All implementations will probably always use a subclass of Generic, but it feels to me like this API should be more abstract, no?

          Absolutely. I realized that but missed to use it in the patch.

          reflect should be able to serialize and deserialize private fields. If not, maybe we should fix that in Avro?

          I have filed AVRO-78. At places the avro code uses Class#getFields() instead of Class#getDeclaredFields()

          Show
          Sharad Agarwal added a comment - All implementations will probably always use a subclass of Generic, but it feels to me like this API should be more abstract, no? Absolutely. I realized that but missed to use it in the patch. reflect should be able to serialize and deserialize private fields. If not, maybe we should fix that in Avro? I have filed AVRO-78 . At places the avro code uses Class#getFields() instead of Class#getDeclaredFields()
          Hide
          Sharad Agarwal added a comment -

          Incorporated review comments.

          Show
          Sharad Agarwal added a comment - Incorporated review comments.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12413524/6120_v3.patch
          against trunk revision 793987.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to cause Findbugs to fail.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/573/testReport/
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/573/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/573/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12413524/6120_v3.patch against trunk revision 793987. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to cause Findbugs to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/573/testReport/ Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/573/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/573/console This message is automatically generated.
          Hide
          Sharad Agarwal added a comment -

          Hudson failed because patch needs avro jar.

          All tests passed locally.
          test-patch output:

          -1 overall.
               [exec]
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec]
               [exec]     +1 tests included.  The patch appears to include 9 new or modified tests.
               [exec]
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec]
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec]
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec]
               [exec]     -1 release audit.  The applied patch generated 265 release audit warnings (more than the trunk's current 264 warnings).
          

          Release audit warnings are due to test file avroRecord.avsc not having license header.

          Show
          Sharad Agarwal added a comment - Hudson failed because patch needs avro jar. All tests passed locally. test-patch output: -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 9 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] -1 release audit. The applied patch generated 265 release audit warnings (more than the trunk's current 264 warnings). Release audit warnings are due to test file avroRecord.avsc not having license header.
          Hide
          Tom White added a comment -

          Looks good. Minor nits:

          • Move the testSerialization() method from TestWritableSerialization since it is now being used from the Avro tests. Something like SerializationTestUtil.
          • Add a package.html to the org.apache.hadoop.io.serializer.avro package.

          Have you tried using either of these serializations with MapReduce? Something like TestJavaSerialization would be good (but this could be another issue).

          Show
          Tom White added a comment - Looks good. Minor nits: Move the testSerialization() method from TestWritableSerialization since it is now being used from the Avro tests. Something like SerializationTestUtil. Add a package.html to the org.apache.hadoop.io.serializer.avro package. Have you tried using either of these serializations with MapReduce? Something like TestJavaSerialization would be good (but this could be another issue).
          Hide
          Doug Cutting added a comment -

          Looks great!

          • shouldn't Serializer/Desrializer#close() call encoder/decoder.close()?
          • should AvroReflectSerialization#accept() be synchronized? I don't know if we generally require Serializers to be thread-safe, but it's probably best and should not have much performance impact.
          Show
          Doug Cutting added a comment - Looks great! shouldn't Serializer/Desrializer#close() call encoder/decoder.close()? should AvroReflectSerialization#accept() be synchronized? I don't know if we generally require Serializers to be thread-safe, but it's probably best and should not have much performance impact.
          Hide
          Doug Cutting added a comment -

          > Hudson failed because patch needs avro jar.

          The avro jar should soon appear in the Maven repo. Then we'll be able to just add it as a dependency in ivy.xml.

          Show
          Doug Cutting added a comment - > Hudson failed because patch needs avro jar. The avro jar should soon appear in the Maven repo. Then we'll be able to just add it as a dependency in ivy.xml.
          Hide
          Sharad Agarwal added a comment -

          shouldn't Serializer/Desrializer#close() call encoder/decoder.close()?

          I don't see encoder/decoder has close() method. Am I missing something?

          Show
          Sharad Agarwal added a comment - shouldn't Serializer/Desrializer#close() call encoder/decoder.close()? I don't see encoder/decoder has close() method. Am I missing something?
          Hide
          Sharad Agarwal added a comment -

          Taken care of Tom and Doug's comments:

          1. Moved the testSerialization() method from TestWritableSerialization
          2. Added package.html to the org.apache.hadoop.io.serializer.avro
          3. Made AvroReflectSerialization#accept() as synchronized
          Show
          Sharad Agarwal added a comment - Taken care of Tom and Doug's comments: Moved the testSerialization() method from TestWritableSerialization Added package.html to the org.apache.hadoop.io.serializer.avro Made AvroReflectSerialization#accept() as synchronized
          Hide
          Sharad Agarwal added a comment -

          Have you tried using either of these serializations with MapReduce? Something like TestJavaSerialization would be good (but this could be another issue).

          Now after the project split this has to be another issue. smile

          Show
          Sharad Agarwal added a comment - Have you tried using either of these serializations with MapReduce? Something like TestJavaSerialization would be good (but this could be another issue). Now after the project split this has to be another issue. smile
          Hide
          Doug Cutting added a comment -

          > I don't see encoder/decoder has close() method. Am I missing something?

          No, it seems Avro is. I think the contract for serializers and deserializers is that they can close the underlying stream. So, until we add Encoder/Decoder#close() to Avro, we should explicitly save a pointer to the stream passed in and close it after flushing the encoder/decoder.

          A few other comments.

          • AvroSerialization should be public, since public classes extend it.
          • Every public or protected method and field in public classes should have javadoc, unless its inherited.
          • We should add a bit more documentation at the package and class level about how this is to be used, e.g., explain that reflect serialization will work for any class that's either in the configured package list or that implements the tag interface, and that specific serialization will just work for generated classes.
          • Is there any reason not to add these serializers to io.serializations by default?
          Show
          Doug Cutting added a comment - > I don't see encoder/decoder has close() method. Am I missing something? No, it seems Avro is. I think the contract for serializers and deserializers is that they can close the underlying stream. So, until we add Encoder/Decoder#close() to Avro, we should explicitly save a pointer to the stream passed in and close it after flushing the encoder/decoder. A few other comments. AvroSerialization should be public, since public classes extend it. Every public or protected method and field in public classes should have javadoc, unless its inherited. We should add a bit more documentation at the package and class level about how this is to be used, e.g., explain that reflect serialization will work for any class that's either in the configured package list or that implements the tag interface, and that specific serialization will just work for generated classes. Is there any reason not to add these serializers to io.serializations by default?
          Hide
          Doug Cutting added a comment -

          Attaching a patch that's identical to Sharad's last patch, except that Avro is now retrieved from the Maven repo. This should now hopefully pass Hudson tests.

          Show
          Doug Cutting added a comment - Attaching a patch that's identical to Sharad's last patch, except that Avro is now retrieved from the Maven repo. This should now hopefully pass Hudson tests.
          Hide
          Doug Cutting added a comment -

          Marking as "Patch Available" to see if Hudson can now find Avro.

          Show
          Doug Cutting added a comment - Marking as "Patch Available" to see if Hudson can now find Avro.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12413707/HADOOP-6120.patch
          against trunk revision 793987.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 15 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          -1 release audit. The applied patch generated 261 release audit warnings (more than the trunk's current 260 warnings).

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/576/testReport/
          Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/576/artifact/trunk/current/releaseAuditDiffWarnings.txt
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/576/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/576/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/576/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12413707/HADOOP-6120.patch against trunk revision 793987. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 261 release audit warnings (more than the trunk's current 260 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/576/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/576/artifact/trunk/current/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/576/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/576/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/576/console This message is automatically generated.
          Hide
          Doug Cutting added a comment -

          Everything passed but the release audit. Not bad.

          The "release audit warnings" link above is broken, but the offending file is avroRecord.avsc. We should be able to put an Apache license at top of that file, since Jackson, the JSON parser that Avro uses, permits C-style comments.

          Show
          Doug Cutting added a comment - Everything passed but the release audit. Not bad. The "release audit warnings" link above is broken, but the offending file is avroRecord.avsc. We should be able to put an Apache license at top of that file, since Jackson, the JSON parser that Avro uses, permits C-style comments.
          Hide
          Sharad Agarwal added a comment -

          Implemented AvroSerialization#close, added more javadoc, made AvroSerialization as public.

          Is there any reason not to add these serializers to io.serializations by default?

          I didn't add considering JavaSerialization also not there in default. But I think its ok to have these. Added both Avro and java serializations to core-default.xml and SerializationFactory.

          Show
          Sharad Agarwal added a comment - Implemented AvroSerialization#close, added more javadoc, made AvroSerialization as public. Is there any reason not to add these serializers to io.serializations by default? I didn't add considering JavaSerialization also not there in default. But I think its ok to have these. Added both Avro and java serializations to core-default.xml and SerializationFactory.
          Hide
          Tom White added a comment -

          JavaSerialization was written more to demonstrate the generality of the abstraction (HADOOP-1986) than to be used in real-world applications, since its performance is considerably less than Writables (and presumably Avro). So I would be inclined not to add it to the default list, preferring users to explicitly enable it. Does this sound reasonable?

          Show
          Tom White added a comment - JavaSerialization was written more to demonstrate the generality of the abstraction ( HADOOP-1986 ) than to be used in real-world applications, since its performance is considerably less than Writables (and presumably Avro). So I would be inclined not to add it to the default list, preferring users to explicitly enable it. Does this sound reasonable?
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12413792/6120_v5.patch
          against trunk revision 795028.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 15 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/579/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/579/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/579/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/579/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12413792/6120_v5.patch against trunk revision 795028. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/579/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/579/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/579/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/579/console This message is automatically generated.
          Hide
          Doug Cutting added a comment -

          Sigh. I just reallized this is really fragile, since we don't permit one to specify a different schema to read things.

          We need to get metadata from the serializer used to write data to the deserializer used to deserialize it. For even Writable, we should check that the data was indeed written with WritableSerializer, since folks can configure a given class to be serialized in different ways.

          So we might add methods like:

          • Deserializer Serialization#getDeserializer(Map<String,String> meta);
          • Serializer Serialization#getSerializer(Map<String,String> meta);
          • Map<String,String> Serializer#getMeta();
            And alter, e.g. SequenceFile and other containers to use these.

          Perhaps this belongs in another Jira, but without it we're missing out on a lot of Avro's functionality.

          Show
          Doug Cutting added a comment - Sigh. I just reallized this is really fragile, since we don't permit one to specify a different schema to read things. We need to get metadata from the serializer used to write data to the deserializer used to deserialize it. For even Writable, we should check that the data was indeed written with WritableSerializer, since folks can configure a given class to be serialized in different ways. So we might add methods like: Deserializer Serialization#getDeserializer(Map<String,String> meta); Serializer Serialization#getSerializer(Map<String,String> meta); Map<String,String> Serializer#getMeta(); And alter, e.g. SequenceFile and other containers to use these. Perhaps this belongs in another Jira, but without it we're missing out on a lot of Avro's functionality.
          Hide
          Sharad Agarwal added a comment -

          JavaSerialization was written more to demonstrate the generality of the abstraction

          I am fine removing JavaSerialization from the defaults, if folks think it will discourage people to use java serialization.

          folks can configure a given class to be serialized in different ways.

          Do you mean that accept(Class<?> c) should select serialization not only based on class but also based on the meta data passed ? We need to change accept as well, no ?

          Perhaps this belongs in another Jira, but without it we're missing out on a lot of Avro's functionality.

          I think this can be discussed in another Jira, as the current implementation should be fine in case reader and writer are for same schema. We can start using this in mapreduce jobs.

          Show
          Sharad Agarwal added a comment - JavaSerialization was written more to demonstrate the generality of the abstraction I am fine removing JavaSerialization from the defaults, if folks think it will discourage people to use java serialization. folks can configure a given class to be serialized in different ways. Do you mean that accept(Class<?> c) should select serialization not only based on class but also based on the meta data passed ? We need to change accept as well, no ? Perhaps this belongs in another Jira, but without it we're missing out on a lot of Avro's functionality. I think this can be discussed in another Jira, as the current implementation should be fine in case reader and writer are for same schema. We can start using this in mapreduce jobs.
          Hide
          Doug Cutting added a comment -

          If, e.g., a class both implements Writable and also either implements AvroReflectSerializeable or whose package is listed in the config, then which serialization is used is determined by the order of the serializers in the config, which could change. Similarly for the Serializeable interface and JavaSerialization. So its not safe to assume that the Class->Serialization map is fixed, and we should really be storing at least the serialization's name in container metadata, and probably also a version (e.g. serialVersionUID for JavaSerialization). And once we have a mechanism to support that, we can also store other metadata, like the schema, so that we can read older versions and generic data.

          +1 for removing JavaSerialization from the defaults.

          +1 for a separate Jira on serialization metadata. But it would be best to have such metadata by the 0.21 freeze, in two weeks.

          Show
          Doug Cutting added a comment - If, e.g., a class both implements Writable and also either implements AvroReflectSerializeable or whose package is listed in the config, then which serialization is used is determined by the order of the serializers in the config, which could change. Similarly for the Serializeable interface and JavaSerialization. So its not safe to assume that the Class->Serialization map is fixed, and we should really be storing at least the serialization's name in container metadata, and probably also a version (e.g. serialVersionUID for JavaSerialization). And once we have a mechanism to support that, we can also store other metadata, like the schema, so that we can read older versions and generic data. +1 for removing JavaSerialization from the defaults. +1 for a separate Jira on serialization metadata. But it would be best to have such metadata by the 0.21 freeze, in two weeks.
          Hide
          Sharad Agarwal added a comment -

          Removed JavaSerialization from defaults

          Show
          Sharad Agarwal added a comment - Removed JavaSerialization from defaults
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12414061/6120_v6.patch
          against trunk revision 796052.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 15 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/582/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/582/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/582/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/582/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12414061/6120_v6.patch against trunk revision 796052. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/582/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/582/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/582/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/582/console This message is automatically generated.
          Hide
          Sharad Agarwal added a comment -

          Test failure is unrelated.

          Show
          Sharad Agarwal added a comment - Test failure is unrelated.
          Hide
          Tom White added a comment -

          Opened HADOOP-6165 for discussion about serialization metadata.

          Show
          Tom White added a comment - Opened HADOOP-6165 for discussion about serialization metadata.
          Hide
          Doug Cutting added a comment -

          I just committed this. Thanks, Sharad!

          Show
          Doug Cutting added a comment - I just committed this. Thanks, Sharad!
          Hide
          Tsz Wo Nicholas Sze added a comment -

          After updated hadoop-core-0.21.0-dev.jar and hadoop-core-test-0.21.0-dev.jar on hdfs, some tests are failing with "java.lang.NoClassDefFoundError: org/apache/avro/io/DatumReader
          ". See HDFS-534.

          Show
          Tsz Wo Nicholas Sze added a comment - After updated hadoop-core-0.21.0-dev.jar and hadoop-core-test-0.21.0-dev.jar on hdfs, some tests are failing with "java.lang.NoClassDefFoundError: org/apache/avro/io/DatumReader ". See HDFS-534 .
          Hide
          Robert Chansler added a comment -

          Editorial pass over all release notes prior to publication of 0.21.

          Show
          Robert Chansler added a comment - Editorial pass over all release notes prior to publication of 0.21.

            People

            • Assignee:
              Sharad Agarwal
              Reporter:
              Sharad Agarwal
            • Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development