|
I tested this on a 100 node cluster (98 tasktrackers) using sort. Given 300MB/node of data and a sufficiently large io.sort.mb and fs.inmemory.size.mb, io.sort.spill.percent=1.0, fs.inmemory.merge.threshold=0, and mapred.inmem.usage=1.0, each reduce took an average of 121 seconds when reading from disk vs 79 seconds merging and reducing from memory. While the sort with the patch finished the job in 8 minutes instead of 9, both had slow tasktrackers that threw off the running time.
This also includes some similar changes to MapTask, letting the record and serialization buffer soft limits be configured separately. This passes mapred/hdfs tests and patch validation on my machine and doesn't break LocalJobRunner (unlike 3446-0).
[exec] -1 overall.
[exec] +1 @author. The patch does not contain any @author tags.
[exec] -1 tests included. The patch doesn't appear to include any new or modified tests.
[exec] Please justify why no tests are needed for this patch.
[exec] -1 javadoc. The javadoc tool appears to have generated 1 warning messages.
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
The "javadoc warning" is from: [javadoc] javadoc: warning - Multiple sources of package comments found for package "org.apache.commons.logging"
[javadoc] javadoc: warning - Multiple sources of package comments found for package "org.apache.commons.logging.impl"
This changes reduce as follows:
This passes all unit tests on my machine. I'll work on measuring its performance and post the results presently. -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12389141/3446-2.patch against trunk revision 690142. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3143/testReport/ This message is automatically generated. You need to add some tests for this. You should also have some forrest edits to explain the usage of the config variables.
I'll add unit tests/docs with the next patch.
As a benchmark, I tried RandomWriter on 19 TaskTrackers, 1GB/node, followed by several sort runs. The max heap memory is set to 512MB, mapred.copy.inmem.percent to 0.8, dfs.replication to 1. The times recorded are the min/max/avg time for the reduce from the end of the shuffle to the end of the reduce. Params are formatted as: io.sort.factor/mapred.inmem.merge.threshold/mapred.inmem.merge.usage/mapred.reduce.inmem.percent
Added a unit test. I'm not sure where documentation for the new parameters belongs...
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12389329/3446-4.patch against trunk revision 691099. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3158/testReport/ This message is automatically generated. Move unrelated changes to MapTask into a separate JIRA (
This looks good, but I think we should define the new parameter mapred.reduce.inmem.percent as a percent of the total heap size rather than a percent of mapred.copy.inmem.percent.
I'd also change the names to: Other than that, it looks good. -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12389455/3446-5.patch against trunk revision 692335. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3184/testReport/ This message is automatically generated. Changed config var names, semantics of reduce percentage, and updated documentation & tests to reflect this
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12389602/3446-6.patch against trunk revision 692597. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3193/testReport/ This message is automatically generated. -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12389772/3446-7.patch against trunk revision 693587. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3222/testReport/ This message is automatically generated. The test failure is not related.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
HADOOP-2095, Someone will try to get a new solution for this into 0.19.