Hadoop Common
  1. Hadoop Common
  2. HADOOP-3446

The reduce task should not flush the in memory file system before starting the reducer

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.19.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      In the case where the entire reduce inputs fit in ram, we currently force the input to disk and re-read it before giving it to the reducer. It would be much better if we merged from the ramfs and any spills to feed the reducer its input.

      1. 3446-0.patch
        14 kB
        Chris Douglas
      2. 3446-1.patch
        15 kB
        Chris Douglas
      3. 3446-2.patch
        30 kB
        Chris Douglas
      4. 3446-3.patch
        35 kB
        Chris Douglas
      5. 3446-4.patch
        45 kB
        Chris Douglas
      6. 3446-5.patch
        41 kB
        Chris Douglas
      7. 3446-6.patch
        41 kB
        Chris Douglas
      8. 3446-7.patch
        41 kB
        Chris Douglas

        Issue Links

          Activity

          Hide
          Owen O'Malley added a comment -

          Unfortunately, the patch that I used is now hopelessly out of date from HADOOP-2095, Someone will try to get a new solution for this into 0.19.

          Show
          Owen O'Malley added a comment - Unfortunately, the patch that I used is now hopelessly out of date from HADOOP-2095 , Someone will try to get a new solution for this into 0.19.
          Hide
          Chris Douglas added a comment -

          I tested this on a 100 node cluster (98 tasktrackers) using sort. Given 300MB/node of data and a sufficiently large io.sort.mb and fs.inmemory.size.mb, io.sort.spill.percent=1.0, fs.inmemory.merge.threshold=0, and mapred.inmem.usage=1.0, each reduce took an average of 121 seconds when reading from disk vs 79 seconds merging and reducing from memory. While the sort with the patch finished the job in 8 minutes instead of 9, both had slow tasktrackers that threw off the running time.

          This also includes some similar changes to MapTask, letting the record and serialization buffer soft limits be configured separately.

          Show
          Chris Douglas added a comment - I tested this on a 100 node cluster (98 tasktrackers) using sort. Given 300MB/node of data and a sufficiently large io.sort.mb and fs.inmemory.size.mb, io.sort.spill.percent=1.0, fs.inmemory.merge.threshold=0, and mapred.inmem.usage=1.0, each reduce took an average of 121 seconds when reading from disk vs 79 seconds merging and reducing from memory. While the sort with the patch finished the job in 8 minutes instead of 9, both had slow tasktrackers that threw off the running time. This also includes some similar changes to MapTask, letting the record and serialization buffer soft limits be configured separately.
          Hide
          Chris Douglas added a comment -

          This passes mapred/hdfs tests and patch validation on my machine and doesn't break LocalJobRunner (unlike 3446-0).

               [exec] -1 overall.
          
               [exec]     +1 @author.  The patch does not contain any @author tags.
          
               [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
               [exec]                         Please justify why no tests are needed for this patch.
          
               [exec]     -1 javadoc.  The javadoc tool appears to have generated 1 warning messages.
          
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
          
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
          

          The "javadoc warning" is from:

              [javadoc] javadoc: warning - Multiple sources of package comments found for package "org.apache.commons.logging"
              [javadoc] javadoc: warning - Multiple sources of package comments found for package "org.apache.commons.logging.impl"
          
          Show
          Chris Douglas added a comment - This passes mapred/hdfs tests and patch validation on my machine and doesn't break LocalJobRunner (unlike 3446-0). [exec] -1 overall. [exec] +1 @author. The patch does not contain any @author tags. [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] -1 javadoc. The javadoc tool appears to have generated 1 warning messages. [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. The "javadoc warning" is from: [javadoc] javadoc: warning - Multiple sources of package comments found for package "org.apache.commons.logging" [javadoc] javadoc: warning - Multiple sources of package comments found for package "org.apache.commons.logging.impl"
          Hide
          Chris Douglas added a comment -

          This changes reduce as follows:

          • Instead of specifying fs.inmemory.size.mb, map outputs will consume mapred.copy.inmem.percent of the maximum heap size as returned from Runtime.maxMemory(), defaulting to 0.7. mapred.child.java.opts defaults to 200mb and fs.inmemory.size.mb defaults to 75mb, so this might be considered an incompatible change.
          • The memory threshold at which the in-memory merge will start during the shuffle is now user-configurable (mapred.inmem.merge.usage), defaulting to the old value of 0.66. mapred.inmem.merge.threshold still controls the maximum number of segments
          • Instead of performing a final in-memory merge, the segments are left in memory. At the beginning of the sort phase, the ReduceCopier is queried for an Iterator to the reduce. A user-configurable property mapred.reduce.inmem.percent determines the maximum size of the segments to be merged from memory during the reduce, relative to the ShuffleRamManager threshold. If the retained segments exceed this threshold, then they must be written to disk before the reduce starts. If there sufficient segments already on disk to require intermediate merges, they will be rolled into the first merge, otherwise they will be merged to disk. The merge into the reduce will contain all the segments that fit below the in-memory reduce threshold from RAM and from the on-disk segments. So given:
            +----+ <- Max heap memory (e.g. -Xmx512m) (H)
            |    |
            |----| <- mapred.copy.inmem.percent (C)
            |    |
            |    |
            |----| <- mapred.reduce.inmem.percent (R)
            |    |
            +----+
            

            The maximum memory used for copying map output wil be H*C while the minimum memory available to the reduce will be H*(1-C*R)

          This passes all unit tests on my machine. I'll work on measuring its performance and post the results presently.

          Show
          Chris Douglas added a comment - This changes reduce as follows: Instead of specifying fs.inmemory.size.mb , map outputs will consume mapred.copy.inmem.percent of the maximum heap size as returned from Runtime.maxMemory() , defaulting to 0.7. mapred.child.java.opts defaults to 200mb and fs.inmemory.size.mb defaults to 75mb, so this might be considered an incompatible change. The memory threshold at which the in-memory merge will start during the shuffle is now user-configurable ( mapred.inmem.merge.usage ), defaulting to the old value of 0.66. mapred.inmem.merge.threshold still controls the maximum number of segments Instead of performing a final in-memory merge, the segments are left in memory. At the beginning of the sort phase, the ReduceCopier is queried for an Iterator to the reduce. A user-configurable property mapred.reduce.inmem.percent determines the maximum size of the segments to be merged from memory during the reduce, relative to the ShuffleRamManager threshold. If the retained segments exceed this threshold, then they must be written to disk before the reduce starts. If there sufficient segments already on disk to require intermediate merges, they will be rolled into the first merge, otherwise they will be merged to disk. The merge into the reduce will contain all the segments that fit below the in-memory reduce threshold from RAM and from the on-disk segments. So given: +----+ <- Max heap memory (e.g. -Xmx512m) (H) | | |----| <- mapred.copy.inmem.percent (C) | | | | |----| <- mapred.reduce.inmem.percent (R) | | +----+ The maximum memory used for copying map output wil be H*C while the minimum memory available to the reduce will be H*(1-C*R) This passes all unit tests on my machine. I'll work on measuring its performance and post the results presently.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12389141/3446-2.patch
          against trunk revision 690142.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3143/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3143/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3143/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3143/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12389141/3446-2.patch against trunk revision 690142. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3143/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3143/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3143/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3143/console This message is automatically generated.
          Hide
          Owen O'Malley added a comment -

          You need to add some tests for this. You should also have some forrest edits to explain the usage of the config variables.

          Show
          Owen O'Malley added a comment - You need to add some tests for this. You should also have some forrest edits to explain the usage of the config variables.
          Hide
          Chris Douglas added a comment -

          I'll add unit tests/docs with the next patch.

          As a benchmark, I tried RandomWriter on 19 TaskTrackers, 1GB/node, followed by several sort runs. The max heap memory is set to 512MB, mapred.copy.inmem.percent to 0.8, dfs.replication to 1. The times recorded are the min/max/avg time for the reduce from the end of the shuffle to the end of the reduce.

          Params are formatted as: io.sort.factor/mapred.inmem.merge.threshold/mapred.inmem.merge.usage/mapred.reduce.inmem.percent

          Params Min Max Avg Notes
          100/0/1.0/1.0 8.35 57.775 23.1603 Never hits disk
          9/15/1.0/0.01 11.164 67.569 38.0216 Spills several times, merges some in-memory segments during intermediate merge
          100/0/1.0/0.5 11.215 74.59 33.8571 Spills some segments to disk before starting reduce
          100/0/1.0/0.0 17.184 88.479 59.5489 Spills all segments to disk before starting reduce
          Show
          Chris Douglas added a comment - I'll add unit tests/docs with the next patch. As a benchmark, I tried RandomWriter on 19 TaskTrackers, 1GB/node, followed by several sort runs. The max heap memory is set to 512MB, mapred.copy.inmem.percent to 0.8, dfs.replication to 1. The times recorded are the min/max/avg time for the reduce from the end of the shuffle to the end of the reduce. Params are formatted as: io.sort.factor/mapred.inmem.merge.threshold/mapred.inmem.merge.usage/mapred.reduce.inmem.percent Params Min Max Avg Notes 100/0/1.0/1.0 8.35 57.775 23.1603 Never hits disk 9/15/1.0/0.01 11.164 67.569 38.0216 Spills several times, merges some in-memory segments during intermediate merge 100/0/1.0/0.5 11.215 74.59 33.8571 Spills some segments to disk before starting reduce 100/0/1.0/0.0 17.184 88.479 59.5489 Spills all segments to disk before starting reduce
          Hide
          Chris Douglas added a comment -

          Added a unit test. I'm not sure where documentation for the new parameters belongs...

          Show
          Chris Douglas added a comment - Added a unit test. I'm not sure where documentation for the new parameters belongs...
          Hide
          Chris Douglas added a comment -

          Added documentation

          Show
          Chris Douglas added a comment - Added documentation
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12389329/3446-4.patch
          against trunk revision 691099.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3158/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3158/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3158/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3158/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12389329/3446-4.patch against trunk revision 691099. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3158/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3158/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3158/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3158/console This message is automatically generated.
          Hide
          Chris Douglas added a comment -

          Move unrelated changes to MapTask into a separate JIRA (HADOOP-4063) and change some LinkedLists to ArrayLists.

          Show
          Chris Douglas added a comment - Move unrelated changes to MapTask into a separate JIRA ( HADOOP-4063 ) and change some LinkedLists to ArrayLists.
          Hide
          Chris Douglas added a comment -

          Submitting to hudson

          Show
          Chris Douglas added a comment - Submitting to hudson
          Hide
          Owen O'Malley added a comment -

          This looks good, but I think we should define the new parameter mapred.reduce.inmem.percent as a percent of the total heap size rather than a percent of mapred.copy.inmem.percent.

          I'd also change the names to:
          mapred.reduce.input.buffer.percent
          mapred.shuffle.input.buffer.percent
          mapred.shuffle.merge.percent

          Other than that, it looks good.

          Show
          Owen O'Malley added a comment - This looks good, but I think we should define the new parameter mapred.reduce.inmem.percent as a percent of the total heap size rather than a percent of mapred.copy.inmem.percent. I'd also change the names to: mapred.reduce.input.buffer.percent mapred.shuffle.input.buffer.percent mapred.shuffle.merge.percent Other than that, it looks good.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12389455/3446-5.patch
          against trunk revision 692335.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3184/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3184/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3184/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3184/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12389455/3446-5.patch against trunk revision 692335. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3184/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3184/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3184/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3184/console This message is automatically generated.
          Hide
          Chris Douglas added a comment -

          Changed config var names, semantics of reduce percentage, and updated documentation & tests to reflect this

          Show
          Chris Douglas added a comment - Changed config var names, semantics of reduce percentage, and updated documentation & tests to reflect this
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12389602/3446-6.patch
          against trunk revision 692597.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3193/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3193/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3193/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3193/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12389602/3446-6.patch against trunk revision 692597. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3193/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3193/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3193/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3193/console This message is automatically generated.
          Hide
          Chris Douglas added a comment -

          Merged with trunk.

          Show
          Chris Douglas added a comment - Merged with trunk.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12389772/3446-7.patch
          against trunk revision 693587.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3222/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3222/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3222/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3222/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12389772/3446-7.patch against trunk revision 693587. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3222/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3222/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3222/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3222/console This message is automatically generated.
          Hide
          Chris Douglas added a comment -

          The test failure is not related.

          Show
          Chris Douglas added a comment - The test failure is not related.
          Hide
          Arun C Murthy added a comment -

          +1

          Show
          Arun C Murthy added a comment - +1
          Hide
          Chris Douglas added a comment -

          I just committed this.

          Show
          Chris Douglas added a comment - I just committed this.

            People

            • Assignee:
              Chris Douglas
              Reporter:
              Owen O'Malley
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development