|
This is a rough patch that isn't quite working yet.
This patch changes the ValueIterator to have 2 instances of the key, and 1 instance of the object and reuse the objects during the iteration. I also fixed some of the compiler warnings for unbound generic types in the ValueIterator.
This patch fixes the value iterator to reuse the key and value between iterations. Aggregation was assuming that the reduce inputs where not reused, so I stringified the value. Is that ok, Runping? I got a minor speed up of 2:33 instead of 2:37 on a simple 1 node word count.
+1 The change wrt UniqValueCount class looks fine. +1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12375966/2399-3.patch against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 8 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1817/testReport/ This message is automatically generated. +1 (although it would be nice to have benchmark figures)
This got affected by
This patch brings it up to the current trunk.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12377559/2399-4.patch against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 8 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests -1. The patch failed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1939/testReport/ This message is automatically generated. The test failure is in HDFS, which I didn't change at all.
I just committed this. Thanks, Owen!
Integrated in Hadoop-trunk #427 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/427/
Integrated in Hadoop-trunk #444 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/444/
This piece of code will print different output with Hadoop 17 (compared to Hadoop 16).
public void reduce(... Iterator<Writable> aValues...) throws IOException { System.out.println("First"); System.out.println("Second"); In Hadoop 16, the values printed after First and Second were the same. I guess this is the consequence of this JIRA. This Jira was marked as an incompatible change because it did change the semantics. However, without this change there was an allocation (and later garbage collection) for every key and value passed to the reduce, which had measurable performance costs.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
As a general rule, I think applications should not expect to be able to hold on to pointers to objects passed to them, but should expect to be able to hold on to pointers returned to them. Lots of exceptions of course, but, in this case, I don't think applications should be expecting to be able to hold on to these objects, and so any that break if we reuse them were not well written.
These were originally reused. Reuse was removed when the combiner was added, since the original combiner kept pointers to the objects.