|
[
Permlink
| « Hide
]
Arun C Murthy added a comment - 19/May/08 08:33 AM
Is this a 0.17.0 blocker?
Uh, and SequenceFile{Input|Output}Format & SequenceFileRecord{Reader|Writer} still relies on Writables too...
There is still clearly work to make the serialization framework fully supported in Hadoop, but I don't think the lack of support in SequenceFile.Reader is a blocker for 0.17.0.
The work done in On a related note, unfortunately the RecordReader interface is incompatible with serialization frameworks that don't reuse objects - like Java Serialization. The problem is that boolean next(K key, V value) throws IOException has no way of passing keys and values that are deserialized from the stream back to the client of the RecordReader. This is not a problem for Writables and Thrift since the client passes in objects that are updated in-place. To fix this will require some surgery on the API. I agree that this shouldn't be a blocker for 17 for the reasons Tom outlined.
Tom, makes sense - thanks for weighing in!
The minor jarring note is that a simple app which writes a SequenceFile with Java Serialization won't be able to read it back...
But it would have to write its own SequenceFileOutputFormat as things stand, so while possible, it's not likely to be a common occurrence. Thanks for picking this up Arun - I hope we can get this fixed in 0.18. There are also few other places that can remove their dependency on Writables without much effort - e.g. SequenceFileInputFilter, and MultipleOutputFormat (and subclasses). Please do not forget MapFile and related stuff.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12383235/hadoop-3413-v2.patch against trunk revision 661918. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2538/testReport/ This message is automatically generated. Doesn't this patch have the same problem that
boolean next(K key, V value) throws IOException; since clearly if you call it with next(null, null) you aren't going to be able to read your data. smile Owen,
I agree. This is a problem for the RecordReader interface, which will need work to change. However, I wasn't proposing we make such a radical change in this Jira, instead we are making it possible to use the serialization framework with SequenceFile.Reader, and removing unnecessary dependencies on Writable in some of the MapReduce layers (so you could at least use Thrift there). Note that the changes to SequenceFile.Reader are compatible with Java serialization, since instead of adding public boolean next(Object key, Object val) there's public Object next(Object key) and public Object getCurrentValue(Object val) Ah, ok. So if I understand, this will let you read java.io sequence files, but not use it for map reduce input, right?
I really think this should have unit tests, before it gets committed.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12383547/hadoop-3413-v4.patch against trunk revision 663889. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2606/testReport/ This message is automatically generated. I just committed this, after fixing to reflect trunk changes. Thanks, Tom!
Integrated in Hadoop-trunk #520 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/520/
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||