|
[
Permlink
| « Hide
]
Stefan Will added a comment - 20/Mar/09 06:16 PM
I'd like to see this fixed as well since one reason I've enabled map output compression is to reduce disk space usage by the mapreduce framework. It appears that currently the map outputs are simply decompressed as soon as they have been downloaded by the reducer.
I verified that it is doing the same on map task no intermediate.x file from o.a.h.mapred.Merger are getting compressed.
this should fix the problem I had to make a few new constructors. I left the old constructors that these files where using because not sure if any other tasks using these. this patch will apply to 0.19-branch I have not worked any on trunk so might need to try dry-run before applying to trunk. tested on my end and working correctly now with this patch.
Billy Pearson made changes - 22/Mar/09 08:58 AM
Billy Pearson made changes - 22/Mar/09 08:59 AM
Billy Pearson made changes - 22/Mar/09 08:59 AM
Billy Pearson made changes - 22/Mar/09 08:59 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12403378/5539.patch against trunk revision 756858. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/117/console This message is automatically generated. -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12403378/5539.patch against trunk revision 756858. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/118/console This message is automatically generated.
Chris Douglas made changes - 22/Mar/09 11:21 PM
Billy Pearson made changes - 24/Mar/09 03:08 AM
Someone can use my patch as a starting point.
The ReduceTask.java call that is the problem is line 2145 I use streaming without a combiner so that should be looked at also to see if it uses o.a.h.mapred.Merger the basic problem is the codec is not passed from these function to the merger so its always null the call to I thank this is a major bug that effects all MR jobs with disk bandwidth that uses compression.
Billy Pearson made changes - 24/Mar/09 03:12 AM
It's a blocker; it will be resolved and backported to 0.20 at least. The road map isn't; the PA queue defines the set of patches that can be committed. The fix version is usually set when it's actually resolved, so where it was committed is documented.
Chris Douglas made changes - 24/Mar/09 04:06 AM
Oh, I see; the patch is for 0.19. My mistake.
Chris Douglas made changes - 24/Mar/09 05:16 AM
Billy Pearson made changes - 07/May/09 07:27 PM
The patch looks good. A few minor points:
Would you be able to provide patches for trunk and 20-branch as well? I got to many thing going on right now to make a new patch fill free to mod my patch to work the way you want and use it to build a patch for trunk I would like to see this fixed in 0.20.1 if at all possible. this will be the one thing holding me up from upgrading to hbase 0.20 when it becomes ready.
Updated the patch to trunk
Jothi Padmanabhan made changes - 15/May/09 07:18 AM
Jothi Padmanabhan made changes - 15/May/09 07:18 AM
Could somebody review this patch? Thanks.
Patch looks good.
This patch clashes with
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12408232/hadoop-5539.patch against trunk revision 776352. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/358/testReport/ This message is automatically generated.
Jothi Padmanabhan made changes - 21/May/09 03:30 AM
Jothi Padmanabhan made changes - 21/May/09 03:30 AM
Jothi Padmanabhan made changes - 21/May/09 03:30 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12408652/hadoop-5539-v1.patch against trunk revision 777761. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/386/testReport/ This message is automatically generated. Patch for the 20 branch
Jothi Padmanabhan made changes - 27/May/09 09:39 AM
I just committed this. Thanks Jothi and Billy!
Devaraj Das made changes - 28/May/09 11:37 AM
Why no unit test? Why no javadoc for new methods?
If you tested this manually, what steps did you perform?
It is pretty difficult to write a unit test for this patch as this patch just enables compression during intermediate merges. The files that are created during the intermediate merges are consumed soon after they are created and the final merged file was compressed even without this patch. I did the same test as Billy had done – add print statements in the framework code (Merger.java) to verify if compression was turned on during intermediate merges.
The newly added methods are in Merger, which is a mapred package private class
Billy, from this comment https://issues.apache.org/jira/browse/HADOOP-5539?focusedCommentId=12708570&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12708570 No I do not need it my version patch with my original patch for 0.19 but other might sense there is still a lot of older version in production that will update to 0.19 branch now that it has a few minor releases on it.
Integrated in Hadoop-trunk #863 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/863/
Owen O'Malley made changes - 08/Jul/09 04:53 PM
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||