|
[
Permlink
| « Hide
]
Amareshwari Sriramadasu added a comment - 31/Mar/08 10:51 AM
Here is patch removing FileSplit.getFile() and LineRecordReader.readLine()
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12378944/patch-2826.txt against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 6 new or modified tests. patch -1. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2102/console This message is automatically generated. instead of public static byte [] readLine(LineReader lineReader)
why not pass a text object ? public static readLine(LineReader lineReader, Text t) read into t the call would be something like .... that would save creating object Text for each readline invocation? Canceling patch to address Mahadev's comment.
Here is patch addressing Mahadev's suggestion.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12379102/patch-2826.txt against trunk revision 643282. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 6 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2119/testReport/ This message is automatically generated. Cancelling the patch, because In the patch pipemapred does a arrayCopy to get byteArray of the Text's length. But Text is doing a arrayCopy for readLine. This involves two array copies for each readline call of streaming.
This patch removes the arrayCopy added in earlier patch and byte array is passed along with its for the split method. This solution should be
better than the earlier since we are avoiding the additional arraycopy. -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12379123/patch-2826.txt against trunk revision 643282. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 6 new or modified tests. javadoc -1. The javadoc tool appears to have generated 1 warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2122/testReport/ This message is automatically generated. Removed javadoc warning.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12379229/patch-2826.txt against trunk revision 643282. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 6 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2141/testReport/ This message is automatically generated. I just committed this. Thanks, Amareshwari!
Integrated in Hadoop-trunk #451 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/451/
i was pulling these changes into our production environment. quick comment:
we copy over an entire line (from readLine) and then we break it into two strings by splitting on tab. So there is an extra scan of the input data and an extra copy based on splitting by tab. instead if we generalized LineReader to instead read until it hits a delimiter - then we can do it with one less scan and copy. Something like: byte [] tabDelimiter = new byte [1]; tabDelimiter[0] = '\t'; while() { getting rid of one copy and scan would be pretty big! |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||