|
[
Permlink
| « Hide
]
Sharad Agarwal added a comment - 30/Jun/09 06:30 AM
The Serialization interface can be implemented for Avro types. Currently Avro has 3 types in java: specific, reflect and generic. I think the current Serialization interface is fine to implement 'specific' type as the schema information can be inferred from the class. The schema can't be inferred from the 'reflect' and 'generic' type classes. So the current Serialization interface doesn't seem to be sufficient to implement these. I think the Serialization interface need to be extended to pass on additional info to have implementation for 'generic' and 'reflect'.
An early patch having AvroSpecificSerialization. Test case not added yet. Include avro jar in classpath to build with this patch.
Avro release is round the corner. Once it is released we can checkin avro jar in lib folder till it is published to maven repos. This is perhaps more of an AVRO question, but, out of curiosity, is there a way to avoid this cast? Could SpecificDatumReader.read() return SpecificRecord, or does the inheritance model forbid it?
This looks useful. A few comments:
This patch:
Looks good. Some comments:
Absolutely. I realized that but missed to use it in the patch.
I have filed Incorporated review comments.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12413524/6120_v3.patch against trunk revision 793987. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to cause Findbugs to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/573/testReport/ This message is automatically generated. Hudson failed because patch needs avro jar.
All tests passed locally. -1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] +1 tests included. The patch appears to include 9 new or modified tests.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
[exec]
[exec] -1 release audit. The applied patch generated 265 release audit warnings (more than the trunk's current 264 warnings).
Release audit warnings are due to test file avroRecord.avsc not having license header. Looks good. Minor nits:
Have you tried using either of these serializations with MapReduce? Something like TestJavaSerialization would be good (but this could be another issue). Looks great!
> Hudson failed because patch needs avro jar.
The avro jar should soon appear in the Maven repo. Then we'll be able to just add it as a dependency in ivy.xml.
I don't see encoder/decoder has close() method. Am I missing something? Taken care of Tom and Doug's comments:
Now after the project split this has to be another issue. smile > I don't see encoder/decoder has close() method. Am I missing something?
No, it seems Avro is. I think the contract for serializers and deserializers is that they can close the underlying stream. So, until we add Encoder/Decoder#close() to Avro, we should explicitly save a pointer to the stream passed in and close it after flushing the encoder/decoder. A few other comments.
Attaching a patch that's identical to Sharad's last patch, except that Avro is now retrieved from the Maven repo. This should now hopefully pass Hudson tests.
Marking as "Patch Available" to see if Hudson can now find Avro.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12413707/HADOOP-6120.patch against trunk revision 793987. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 261 release audit warnings (more than the trunk's current 260 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/576/testReport/ This message is automatically generated. Everything passed but the release audit. Not bad.
The "release audit warnings" link above is broken, but the offending file is avroRecord.avsc. We should be able to put an Apache license at top of that file, since Jackson, the JSON parser that Avro uses, permits C-style comments. Implemented AvroSerialization#close, added more javadoc, made AvroSerialization as public.
I didn't add considering JavaSerialization also not there in default. But I think its ok to have these. Added both Avro and java serializations to core-default.xml and SerializationFactory. JavaSerialization was written more to demonstrate the generality of the abstraction (
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12413792/6120_v5.patch against trunk revision 795028. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/579/testReport/ This message is automatically generated. Sigh. I just reallized this is really fragile, since we don't permit one to specify a different schema to read things.
We need to get metadata from the serializer used to write data to the deserializer used to deserialize it. For even Writable, we should check that the data was indeed written with WritableSerializer, since folks can configure a given class to be serialized in different ways. So we might add methods like:
Perhaps this belongs in another Jira, but without it we're missing out on a lot of Avro's functionality.
I am fine removing JavaSerialization from the defaults, if folks think it will discourage people to use java serialization.
Do you mean that accept(Class<?> c) should select serialization not only based on class but also based on the meta data passed ? We need to change accept as well, no ?
I think this can be discussed in another Jira, as the current implementation should be fine in case reader and writer are for same schema. We can start using this in mapreduce jobs. If, e.g., a class both implements Writable and also either implements AvroReflectSerializeable or whose package is listed in the config, then which serialization is used is determined by the order of the serializers in the config, which could change. Similarly for the Serializeable interface and JavaSerialization. So its not safe to assume that the Class->Serialization map is fixed, and we should really be storing at least the serialization's name in container metadata, and probably also a version (e.g. serialVersionUID for JavaSerialization). And once we have a mechanism to support that, we can also store other metadata, like the schema, so that we can read older versions and generic data.
+1 for removing JavaSerialization from the defaults. +1 for a separate Jira on serialization metadata. But it would be best to have such metadata by the 0.21 freeze, in two weeks. Removed JavaSerialization from defaults
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12414061/6120_v6.patch against trunk revision 796052. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/582/testReport/ This message is automatically generated. Test failure is unrelated.
Opened
I just committed this. Thanks, Sharad!
After updated hadoop-core-0.21.0-dev.jar and hadoop-core-test-0.21.0-dev.jar on hdfs, some tests are failing with "java.lang.NoClassDefFoundError: org/apache/avro/io/DatumReader
". See Editorial pass over all release notes prior to publication of 0.21.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||