Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-64

Map-side sort is hampered by io.sort.record.percent

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: performance, task
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Currently io.sort.record.percent is a fairly obscure, per-job configurable, expert-level parameter which controls how much accounting space is available for records in the map-side sort buffer (io.sort.mb). Typically values for io.sort.mb (100) and io.sort.record.percent (0.05) imply that we can store ~350,000 records in the buffer before necessitating a sort/combine/spill.

      However for many applications which deal with small records e.g. the world-famous wordcount and it's family this implies we can only use 5-10% of io.sort.mb i.e. (5-10M) before we spill inspite of having much more memory available in the sort-buffer. The word-count for e.g. results in ~12 spills (given hdfs block size of 64M). The presence of a combiner exacerbates the problem by piling serialization/deserialization of records too...

      Sure, jobs can configure io.sort.record.percent, but it's tedious and obscure; we really can do better by getting the framework to automagically pick it by using all available memory (upto io.sort.mb) for either the data or accounting.

      1. M64-10.patch
        114 kB
        Chris Douglas
      2. M64-9.patch
        106 kB
        Chris Douglas
      3. M64-8.patch
        108 kB
        Chris Douglas
      4. M64-7.patch
        107 kB
        Chris Douglas
      5. M64-6.patch
        108 kB
        Chris Douglas
      6. M64-5.patch
        108 kB
        Chris Douglas
      7. M64-4.patch
        106 kB
        Chris Douglas
      8. M64-0i.png
        30 kB
        Chris Douglas
      9. M64-1i.png
        32 kB
        Chris Douglas
      10. M64-2i.png
        29 kB
        Chris Douglas
      11. M64-3.patch
        93 kB
        Chris Douglas
      12. M64-2.patch
        89 kB
        Chris Douglas
      13. M64-1.patch
        86 kB
        Chris Douglas
      14. M64-0.patch
        80 kB
        Chris Douglas

        Activity

        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #228 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/228/)
        . Eliminate io.sort.record.percent from MapTask configuration.

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #228 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/228/ ) . Eliminate io.sort.record.percent from MapTask configuration.
        Hide
        Chris Douglas added a comment -

        I committed this.

        Show
        Chris Douglas added a comment - I committed this.
        Hide
        Arun C Murthy added a comment -

        +1

        Show
        Arun C Murthy added a comment - +1
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12434447/M64-10.patch
        against trunk revision 905008.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 22 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/423/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/423/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/423/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/423/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12434447/M64-10.patch against trunk revision 905008. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 22 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/423/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/423/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/423/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/423/console This message is automatically generated.
        Hide
        Chris Douglas added a comment -
        • Merge with trunk.
        • Removed refs to io.sort.record.percent.
        • Updated documentation.
        Show
        Chris Douglas added a comment - Merge with trunk. Removed refs to io.sort.record.percent . Updated documentation.
        Hide
        Chris Douglas added a comment -

        The failing test, TestTaskTrackerBlacklist, is not related. Filed MAPREDUCE-1412.

        mapreduce.map.sort.record.percent and mapreduce.map.sort.spill.percent should be mapreduce.map.sort.record.fraction and mapreduce.map.sort.spill.fraction , respectively. They scale from 0.0 to 1.0, not from 0.0 to 100.0 or to 100 .

        There are a number of floats ranging from 0 to 1 in the configuration named *.percent, so this will remain consistent. Since there is a large renaming/deprecation of variables is in 0.21 (MAPREDUCE-849), feel free to pursue this point in a separate issue covering all instances.

        Show
        Chris Douglas added a comment - The failing test, TestTaskTrackerBlacklist , is not related. Filed MAPREDUCE-1412 . mapreduce.map.sort.record.percent and mapreduce.map.sort.spill.percent should be mapreduce.map.sort.record.fraction and mapreduce.map.sort.spill.fraction , respectively. They scale from 0.0 to 1.0, not from 0.0 to 100.0 or to 100 . There are a number of floats ranging from 0 to 1 in the configuration named *.percent, so this will remain consistent. Since there is a large renaming/deprecation of variables is in 0.21 ( MAPREDUCE-849 ), feel free to pursue this point in a separate issue covering all instances.
        Hide
        Dick King added a comment -

        One minor thing ...

        mapreduce.map.sort.record.percent and mapreduce.map.sort.spill.percent should be mapreduce.map.sort.record.fraction and mapreduce.map.sort.spill.fraction , respectively. They scale from 0.0 to 1.0, not from 0.0 to 100.0 or to 100 .

        Show
        Dick King added a comment - One minor thing ... mapreduce.map.sort.record.percent and mapreduce.map.sort.spill.percent should be mapreduce.map.sort.record.fraction and mapreduce.map.sort.spill.fraction , respectively. They scale from 0.0 to 1.0, not from 0.0 to 100.0 or to 100 .
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12431283/M64-9.patch
        against trunk revision 902727.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 19 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/406/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/406/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/406/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/406/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12431283/M64-9.patch against trunk revision 902727. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 19 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/406/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/406/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/406/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/406/console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12431283/M64-9.patch
        against trunk revision 902717.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 19 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/405/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/405/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/405/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/405/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12431283/M64-9.patch against trunk revision 902717. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 19 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/405/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/405/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/405/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/405/console This message is automatically generated.
        Hide
        Chris Douglas added a comment -

        Merge with trunk.

        Show
        Chris Douglas added a comment - Merge with trunk.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12431189/M64-8.patch
        against trunk revision 902272.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 19 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/402/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/402/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/402/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/402/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12431189/M64-8.patch against trunk revision 902272. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 19 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/402/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/402/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/402/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/402/console This message is automatically generated.
        Hide
        Chris Douglas added a comment -

        Merge with trunk.

        Show
        Chris Douglas added a comment - Merge with trunk.
        Hide
        Chris Douglas added a comment -

        The failure in TestRecoveryManager is not related:

        2010-01-12 00:23:10,054 ERROR mapred.MiniMRCluster (MiniMRCluster.java:run(121)) - Job tracker crashed
        java.lang.IllegalArgumentException: port out of range:-1
        	at java.net.InetSocketAddress.<init>(InetSocketAddress.java:118)
        	at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:166)
        	at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:124)
        	at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:1409)
        	at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:243)
        	at org.apache.hadoop.mapred.MiniMRCluster$JobTrackerRunner.run(MiniMRCluster.java:118)
        	at java.lang.Thread.run(Thread.java:619)
        

        Looks like HADOOP-4744 in the JobTracker, which fails to start so MiniMRCluster.JobTrackerRunner.tracker is null. Instead of logging the error, MiniMRCluster should fail the test (MAPREDUCE-1366).

        It also passes on my machine.

        Show
        Chris Douglas added a comment - The failure in TestRecoveryManager is not related: 2010-01-12 00:23:10,054 ERROR mapred.MiniMRCluster (MiniMRCluster.java:run(121)) - Job tracker crashed java.lang.IllegalArgumentException: port out of range:-1 at java.net.InetSocketAddress.<init>(InetSocketAddress.java:118) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:166) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:124) at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:1409) at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:243) at org.apache.hadoop.mapred.MiniMRCluster$JobTrackerRunner.run(MiniMRCluster.java:118) at java.lang.Thread.run(Thread.java:619) Looks like HADOOP-4744 in the JobTracker, which fails to start so MiniMRCluster.JobTrackerRunner.tracker is null. Instead of logging the error, MiniMRCluster should fail the test ( MAPREDUCE-1366 ). It also passes on my machine.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12429943/M64-7.patch
        against trunk revision 898019.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 19 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/374/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/374/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/374/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/374/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12429943/M64-7.patch against trunk revision 898019. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 19 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/374/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/374/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/374/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/374/console This message is automatically generated.
        Hide
        Chris Douglas added a comment -

        Merged with trunk.

        Show
        Chris Douglas added a comment - Merged with trunk.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12429872/M64-6.patch
        against trunk revision 897118.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 19 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/260/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/260/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/260/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/260/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12429872/M64-6.patch against trunk revision 897118. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 19 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/260/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/260/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/260/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/260/console This message is automatically generated.
        Hide
        Chris Douglas added a comment -

        The test case needs to be fixed.

        It assumes the vint length of the key will match the vint length of the key data, but this isn't true for some values, like 128 (127 byte record + 1 vint byte; test checks against 2 vint bytes for full, 128 byte record length).

        Show
        Chris Douglas added a comment - The test case needs to be fixed. It assumes the vint length of the key will match the vint length of the key data, but this isn't true for some values, like 128 (127 byte record + 1 vint byte; test checks against 2 vint bytes for full, 128 byte record length).
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12428718/M64-5.patch
        against trunk revision 893361.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 19 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/336/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/336/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/336/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/336/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428718/M64-5.patch against trunk revision 893361. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 19 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/336/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/336/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/336/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/336/console This message is automatically generated.
        Hide
        Chris Douglas added a comment -

        Trying Hudson again

        Show
        Chris Douglas added a comment - Trying Hudson again
        Hide
        Chris Douglas added a comment -

        The failing test cases are also in trunk (MAPREDUCE-1311, MAPREDUCE-1312). The TestJobCleanup failures appear unrelated, and it passes on my machine.

        The reason I find it odd is that conventionally one can flush a stream an arbitrary number of times without destroying the stream. This is clearly not the case here. [...]

        Agreed. And yes, my intent was to address this in MAPREDUCE-1324.

        Would you please file a jira wrt this?

        Sure. Filed MAPREDUCE-1329

        Show
        Chris Douglas added a comment - The failing test cases are also in trunk ( MAPREDUCE-1311 , MAPREDUCE-1312 ). The TestJobCleanup failures appear unrelated, and it passes on my machine. The reason I find it odd is that conventionally one can flush a stream an arbitrary number of times without destroying the stream. This is clearly not the case here. [...] Agreed. And yes, my intent was to address this in MAPREDUCE-1324 . Would you please file a jira wrt this? Sure. Filed MAPREDUCE-1329
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12428718/M64-5.patch
        against trunk revision 893055.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 19 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/236/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/236/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/236/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/236/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428718/M64-5.patch against trunk revision 893055. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 19 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/236/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/236/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/236/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/236/console This message is automatically generated.
        Hide
        Hong Tang added a comment -

        This is an interesting idea. Clever implementations could also avoid skewing the average record size disproportionately (possibly an independent issue). Please file a JIRA.

        Will do.

        The distinction between flush and close is not clear for a Collector.

        The reason I find it odd is that conventionally one can flush a stream an arbitrary number of times without destroying the stream. This is clearly not the case here. Yes, I agree that MAPREDUCE-1211 would be relevant here. I am also fine with deferring the work of making the distinction between close and flush consistent with java io stream convention to MAPREDUCE-1324 (assuming that is your intention).

        Since the testing/validation of this patch is difficult, and you've already done the work, I'd like to postpone this to a separate issue if that's OK.

        That is fine. Would you please file a jira wrt this?

        Show
        Hong Tang added a comment - This is an interesting idea. Clever implementations could also avoid skewing the average record size disproportionately (possibly an independent issue). Please file a JIRA. Will do. The distinction between flush and close is not clear for a Collector. The reason I find it odd is that conventionally one can flush a stream an arbitrary number of times without destroying the stream. This is clearly not the case here. Yes, I agree that MAPREDUCE-1211 would be relevant here. I am also fine with deferring the work of making the distinction between close and flush consistent with java io stream convention to MAPREDUCE-1324 (assuming that is your intention). Since the testing/validation of this patch is difficult, and you've already done the work, I'd like to postpone this to a separate issue if that's OK. That is fine. Would you please file a jira wrt this?
        Hide
        Chris Douglas added a comment -

        Thank you for the detailed comments, Hong.

        The logic of calculating the equator seems to be missing a multipication of METASIZE

        In SpillThread: " if (bufend < bufindex && bufindex < bufstart)" should probably be " if (bufend < bufstart) {"

        Buffer.write(byte[], int, int): "blockwrite = distkvi < distkve" should be "blockwrite = distkvi <= distkve"

        Great catches! Fixed.

        A potential inefficiency if we encounter a large record when there are few (but not zero) records in the buffer - this would lead to these few records written out as a single spill. A better way is to spill out the single large record, and continue accumulating records after that.

        This is an interesting idea. Clever implementations could also avoid skewing the average record size disproportionately (possibly an independent issue). Please file a JIRA.

        TestMapCollection: uniform random is used [...] Suggest to change to a distribution that gives more weight to small values

        Soright. Modified the random testcase.

        Any particular reason to shut down the thread in Buffer.flush() rather than Buffer.close()?

        Only history. The distinction between flush and close is not clear for a Collector, particularly since one or the other is a noop for map-only/reducer'd jobs. Pulling the MapOutputBuffer into a standalone class could help to refine the distinction. Work such as MAPREDUCE-1211 would clearly benefit; IIRC, the current version of that proposal also pulled out the collector. Filed MAPREDUCE-1324 to track extracting the buffer from MapTask.

        I also have a couple of suggestions on refactoring the code to make it more readable [...]

        These are all good suggestions. I thought of the index-based code as inferring high-level abstractions from low-level state, but the spillExists, spillInProgress flags distill a lot of esoteric, often redundant calculation into a more understandable format. There's another missing abstraction for setting/querying metadata, which could replace the inline kvmeta manipulations. Since the testing/validation of this patch is difficult, and you've already done the work, I'd like to postpone this to a separate issue if that's OK.

        Other very minor nits [...]

        Fixed all these.

        Thank you again for so thorough a review.

        Show
        Chris Douglas added a comment - Thank you for the detailed comments, Hong. The logic of calculating the equator seems to be missing a multipication of METASIZE In SpillThread: " if (bufend < bufindex && bufindex < bufstart)" should probably be " if (bufend < bufstart) {" Buffer.write(byte[], int, int): "blockwrite = distkvi < distkve" should be "blockwrite = distkvi <= distkve" Great catches! Fixed. A potential inefficiency if we encounter a large record when there are few (but not zero) records in the buffer - this would lead to these few records written out as a single spill. A better way is to spill out the single large record, and continue accumulating records after that. This is an interesting idea. Clever implementations could also avoid skewing the average record size disproportionately (possibly an independent issue). Please file a JIRA. TestMapCollection: uniform random is used [...] Suggest to change to a distribution that gives more weight to small values Soright. Modified the random testcase. Any particular reason to shut down the thread in Buffer.flush() rather than Buffer.close()? Only history. The distinction between flush and close is not clear for a Collector, particularly since one or the other is a noop for map-only/reducer'd jobs. Pulling the MapOutputBuffer into a standalone class could help to refine the distinction. Work such as MAPREDUCE-1211 would clearly benefit; IIRC, the current version of that proposal also pulled out the collector. Filed MAPREDUCE-1324 to track extracting the buffer from MapTask. I also have a couple of suggestions on refactoring the code to make it more readable [...] These are all good suggestions. I thought of the index-based code as inferring high-level abstractions from low-level state, but the spillExists , spillInProgress flags distill a lot of esoteric, often redundant calculation into a more understandable format. There's another missing abstraction for setting/querying metadata, which could replace the inline kvmeta manipulations. Since the testing/validation of this patch is difficult, and you've already done the work, I'd like to postpone this to a separate issue if that's OK. Other very minor nits [...] Fixed all these. Thank you again for so thorough a review.
        Hide
        Hong Tang added a comment -

        The design is quite clever and elegant. I like it. The code is a clean, but a bit tricky to understand (more on this later with some of my suggestions on refactory).

        • MapOutputBuffer.collect: The logic of calculating the equator seems to be missing a multipication of METASIZE. Should be:
                         final int newPos = (bufindex +
                           Math.max(2 * METASIZE - 1,
                                   Math.min(distkvi / 2, distkvi / (METASIZE + avgRec) * METASIZE)))
          
        • Buffer.write(byte[], int, int): "blockwrite = distkvi < distkve" should be "blockwrite = distkvi <= distkve"
        • A potential inefficiency if we encounter a large record when there are few (but not zero) records in the buffer - this would lead to these few records written out as a single spill. A better way is to spill out the single large record, and continue accumulating records after that. This should be a very rare corner case so may not need to be addressed in this jira. Would be nice to mark it with TODO in the comments.
        • Any particular reason to shut down the thread in Buffer.flush() rather than Buffer.close()?
        • In SpillThread: " if (bufend < bufindex && bufindex < bufstart)" should probably be " if (bufend < bufstart) {"
        • In TestMapCollection: uniform random is used to determine how many bytes to write in serialization, and to determine key/value size for RandomFactory. This is less desirable in the sense that very small values are not sufficiently tested. Suggest to change to a distribution that gives more weight to small values e.g. (min + Math.exp(random.nextDouble()*Math.log(max-min))).

        I also have a couple of suggestions on refactoring the code to make it more readable:

        • Separate the sets of variables used by main thread for writing from the set of variables for the spill threads for spilling. (Currently kvend and bufend are used in two different context: when there is a spill active or when there is not).
        • Related to the above, adding a variable called spillExists to describe the state when there is a spill buffer. The life time of spillExists==TRUE covers that of spillInProgress==TRUE.
        • suggest to change the direct (idx+offset) based access to kvmeta to method calls.
        • Suggest to refactor the logic on marking a spill region.

        Other very minor nits:

        • MapOutputBuffer.collect: it would be nice to spell out the invariance that there are always METASIZE bytes available beyond kvindex.
        • MapOutputBuffer: document the use of "bufferRemaining" as a hint whether we may need to block and spill. If bufferRemaining>=0, there is guaranteed space for us to continue write.
        • BlockBuffer is only usable inside MapOutputBuffer, suggest remove the constructor BlockBuffer(OutputStream).
        • Suggest rename BlockBuffer.reset() to BlockBuffer.shiftKeyBuffer().
        • Suggest to add a note to Buffer.write(byte[], int, int) that the checking of bufferRemaining should not be bypassed even if len==0.
        Show
        Hong Tang added a comment - The design is quite clever and elegant. I like it. The code is a clean, but a bit tricky to understand (more on this later with some of my suggestions on refactory). MapOutputBuffer.collect: The logic of calculating the equator seems to be missing a multipication of METASIZE. Should be: final int newPos = (bufindex + Math .max(2 * METASIZE - 1, Math .min(distkvi / 2, distkvi / (METASIZE + avgRec) * METASIZE))) Buffer.write(byte[], int, int): "blockwrite = distkvi < distkve" should be "blockwrite = distkvi <= distkve" A potential inefficiency if we encounter a large record when there are few (but not zero) records in the buffer - this would lead to these few records written out as a single spill. A better way is to spill out the single large record, and continue accumulating records after that. This should be a very rare corner case so may not need to be addressed in this jira. Would be nice to mark it with TODO in the comments. Any particular reason to shut down the thread in Buffer.flush() rather than Buffer.close()? In SpillThread: " if (bufend < bufindex && bufindex < bufstart)" should probably be " if (bufend < bufstart) {" In TestMapCollection: uniform random is used to determine how many bytes to write in serialization, and to determine key/value size for RandomFactory. This is less desirable in the sense that very small values are not sufficiently tested. Suggest to change to a distribution that gives more weight to small values e.g. (min + Math.exp(random.nextDouble()*Math.log(max-min))). I also have a couple of suggestions on refactoring the code to make it more readable: Separate the sets of variables used by main thread for writing from the set of variables for the spill threads for spilling. (Currently kvend and bufend are used in two different context: when there is a spill active or when there is not). Related to the above, adding a variable called spillExists to describe the state when there is a spill buffer. The life time of spillExists==TRUE covers that of spillInProgress==TRUE. suggest to change the direct (idx+offset) based access to kvmeta to method calls. Suggest to refactor the logic on marking a spill region. Other very minor nits: MapOutputBuffer.collect: it would be nice to spell out the invariance that there are always METASIZE bytes available beyond kvindex. MapOutputBuffer: document the use of "bufferRemaining" as a hint whether we may need to block and spill. If bufferRemaining>=0, there is guaranteed space for us to continue write. BlockBuffer is only usable inside MapOutputBuffer, suggest remove the constructor BlockBuffer(OutputStream). Suggest rename BlockBuffer.reset() to BlockBuffer.shiftKeyBuffer(). Suggest to add a note to Buffer.write(byte[], int, int) that the checking of bufferRemaining should not be bypassed even if len==0.
        Hide
        Chris Douglas added a comment -

        Both the core (MAPREDUCE-1271) and contrib (MAPREDUCE-1124) failures are known.

        Show
        Chris Douglas added a comment - Both the core ( MAPREDUCE-1271 ) and contrib ( MAPREDUCE-1124 ) failures are known.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12427438/M64-4.patch
        against trunk revision 888761.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 19 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/176/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/176/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/176/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/176/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12427438/M64-4.patch against trunk revision 888761. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 19 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/176/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/176/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/176/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/176/console This message is automatically generated.
        Hide
        Chris Douglas added a comment -

        Merged with trunk.

        Thanks for running the coverage tool, Todd.

        Since we're using the Local Runner for these tests, it's all a single partition

        TestMiniMRDFSSort is the only test I know of that uses multiple partitions. I hope that will change after MAPREDUCE-1050.

        Line 1097 ("if (bufindex + headbytelen < avail) {" in void reset()) is always true in our tests. We should get a test case to exercise the other half of this branch.

        This is awkward to write reliably, the patch already adds a lot of very specific test code, and the tested path is the more aggressive one. I'm OK leaving this for now.

        We don't current run any tests with job.getCompressMapOutput returning true

        Line 1365 (kvstart >= kvend ternary in sortAndSpill) is always true.

        Added cases to TestMapCollection.

        can you put in a small comment describing the synchronization policy for the various offsets? Those used to be volatile and now they're under a lock, so it should be good to note that in the code.

        The patch- and this issue- adds a fair amount of documentation, including calling out some of the places where locking and visibility are non-trivial. The volatile modifier in the existing code was paranoid, if not redundant.

        Show
        Chris Douglas added a comment - Merged with trunk. Thanks for running the coverage tool, Todd. Since we're using the Local Runner for these tests, it's all a single partition TestMiniMRDFSSort is the only test I know of that uses multiple partitions. I hope that will change after MAPREDUCE-1050 . Line 1097 ("if (bufindex + headbytelen < avail) {" in void reset()) is always true in our tests. We should get a test case to exercise the other half of this branch. This is awkward to write reliably, the patch already adds a lot of very specific test code, and the tested path is the more aggressive one. I'm OK leaving this for now. We don't current run any tests with job.getCompressMapOutput returning true Line 1365 (kvstart >= kvend ternary in sortAndSpill) is always true. Added cases to TestMapCollection. can you put in a small comment describing the synchronization policy for the various offsets? Those used to be volatile and now they're under a lock, so it should be good to note that in the code. The patch- and this issue- adds a fair amount of documentation, including calling out some of the places where locking and visibility are non-trivial. The volatile modifier in the existing code was paranoid, if not redundant.
        Hide
        Chris Douglas added a comment -

        The patch no longer applies after HADOOP-1147

        Show
        Chris Douglas added a comment - The patch no longer applies after HADOOP-1147
        Hide
        Todd Lipcon added a comment -

        Thanks for those great diagrams - they really helped me understand things much better! A picture is worth 1000 lines of code or something

        I applied your patch just now and ran it through clover for coverage analysis. Here are a couple things I think we should cover before committing:

        • We don't current run any tests with job.getCompressMapOutput returning true. This caused an issue or two with the shuffle in the past, so we should get at least one test that uses a codec.
        • Since we're using the Local Runner for these tests, it's all a single partition. This is probably OK, since I imagine other tests throughout Hadoop exercise those paths (I'm only looking at coverage from TestMapCollection here)
        • Line 1097 ("if (bufindex + headbytelen < avail) {" in void reset()) is always true in our tests. We should get a test case to exercise the other half of this branch.
        • Line 1365 (kvstart >= kvend ternary in sortAndSpill) is always true. Should exercise the other half of that.

        On a code level, one more thing I noticed - can you put in a small comment describing the synchronization policy for the various offsets? Those used to be volatile and now they're under a lock, so it should be good to note that in the code.

        I'll try to get a chance to run some basic benchmarks later this week.

        Show
        Todd Lipcon added a comment - Thanks for those great diagrams - they really helped me understand things much better! A picture is worth 1000 lines of code or something I applied your patch just now and ran it through clover for coverage analysis. Here are a couple things I think we should cover before committing: We don't current run any tests with job.getCompressMapOutput returning true. This caused an issue or two with the shuffle in the past, so we should get at least one test that uses a codec. Since we're using the Local Runner for these tests, it's all a single partition. This is probably OK, since I imagine other tests throughout Hadoop exercise those paths (I'm only looking at coverage from TestMapCollection here) Line 1097 ("if (bufindex + headbytelen < avail) {" in void reset()) is always true in our tests. We should get a test case to exercise the other half of this branch. Line 1365 (kvstart >= kvend ternary in sortAndSpill) is always true. Should exercise the other half of that. On a code level, one more thing I noticed - can you put in a small comment describing the synchronization policy for the various offsets? Those used to be volatile and now they're under a lock, so it should be good to note that in the code. I'll try to get a chance to run some basic benchmarks later this week.
        Hide
        Chris Douglas added a comment -

        one simple way might be to simply add TRACE level log messages at every collect() call with the current values of every index plus the spill number [...]

        That could be an interesting visualization. I'd already made up the diagrams, but anything that helps the analysis and validation would be welcome. I'd rather not add a trace to the committed code, but data from it sounds great.

        I ran a simple test where I was running a sort of 10 byte records, and it turned out that the "optimal" io.sort.record.percent caused my job to be significantly slower. It was the case then that a small number of large spills actually ran slower than a large number of small spills. Did we ever determine what that issue was? I think we should try to understand why the theory isn't agreeing with observations here.

        IIRC those tests used a non-RawComparator, right? Runping reported similar results, where hits to concurrent collection were more expensive than small spills. The current theory is that keeping the map thread unblocked is usually better for performance. Based on this observation, I'm hoping that the spill.percent can also be eliminated at some point in the future, though the performance we're leaving on the table there is probably not as severe and is more difficult to generalize. Microbenchmarks may also not capture the expense of merging many small spills in a busy, shared cluster, where HDFS and other tasks are completing for disk bandwidth. I'll be very interested in metrics from MAPREDUCE-1115, as they would help to flesh out this hypothesis.


        The documentation (such as it is) in HADOOP-2919 describes the existing code. The metadata and serialization data are tracked using a set of indices marking the start and end of a spill (kvstart, kvend) and the current position (kvindex) while the serialization data are described by similar markers (bufstart, bufend, bufindex). There are two other indices carried over from the existing design. bufmark is the position in the serialized record data of the end of the last fully serialized record. bufvoid is necessary for the RawComparator interface, which requires contiguous ranges for key compares; if a serialized key crosses the end of the buffer, it must be copied to the front to satisfy the aforementioned API spec. All of these are retained; the role of each is largely unchanged.

        The proposed design adds another parameter, the equator (while kvstart and bufstart could be replaced with a single variable similar to equator the effort seemed misspent). The record serialization moves "forward" in the buffer, while the metadata are allocated in 16 byte blocks in the opposite direction. This is illustrated in the following diagram:

        The role played by kvoffsets and kvindices is preserved; logically, particularly in the spill, each is interpreted in roughly the same way. In the new code, the allocation is not static, but will instead expand with the serialized records. This avoids degenerate cases for combiners and multilevel merges (though not necessarily optimal performance).

        Spills are triggered in two conditions: either the soft limit is reached (collection proceeds concurrently with the spill) or a record is large enough to require a spill before it can be written to the buffer (collection is blocked). In the former case is illustrated here:

        The equator is moved to an offset proportional to the average record size (caveats above), kvindex is moved off the equator, aligned with the end of the array (int alignment, also so no metadata block will span the end of the array). Collection proceeds again from the equator, growing toward the ends of the spill. Should either run out of space, collection will block until the spill completes. Note that there is no partially written data when the soft limit is reached; it can only be triggered in collect, not in the blocking buffer.

        The other case to consider is when record data are partially written into the collection buffer, but the available space is exhausted:

        Here, the equator is moved to the beginning of the partial record and collection blocks. When the spill completes, the metadata are written off the equator and serialization of the record can continue.

        During collection, indices are adjusted only when holding a lock. As in the current code, the lock is only obtained in collect when one of the possible conditions for spilling may be satisfied. Since collect does not block, every serialized record performs a zero-length write into the buffer once both the key and value are written. This ensures that all the boundaries are checked and that collection will block if necessary.

        That's about it.

        Show
        Chris Douglas added a comment - one simple way might be to simply add TRACE level log messages at every collect() call with the current values of every index plus the spill number [...] That could be an interesting visualization. I'd already made up the diagrams, but anything that helps the analysis and validation would be welcome. I'd rather not add a trace to the committed code, but data from it sounds great. I ran a simple test where I was running a sort of 10 byte records, and it turned out that the "optimal" io.sort.record.percent caused my job to be significantly slower. It was the case then that a small number of large spills actually ran slower than a large number of small spills. Did we ever determine what that issue was? I think we should try to understand why the theory isn't agreeing with observations here. IIRC those tests used a non-RawComparator, right? Runping reported similar results, where hits to concurrent collection were more expensive than small spills. The current theory is that keeping the map thread unblocked is usually better for performance. Based on this observation, I'm hoping that the spill.percent can also be eliminated at some point in the future, though the performance we're leaving on the table there is probably not as severe and is more difficult to generalize. Microbenchmarks may also not capture the expense of merging many small spills in a busy, shared cluster, where HDFS and other tasks are completing for disk bandwidth. I'll be very interested in metrics from MAPREDUCE-1115 , as they would help to flesh out this hypothesis. The documentation (such as it is) in HADOOP-2919 describes the existing code. The metadata and serialization data are tracked using a set of indices marking the start and end of a spill ( kvstart , kvend ) and the current position ( kvindex ) while the serialization data are described by similar markers ( bufstart , bufend , bufindex ). There are two other indices carried over from the existing design. bufmark is the position in the serialized record data of the end of the last fully serialized record. bufvoid is necessary for the RawComparator interface, which requires contiguous ranges for key compares; if a serialized key crosses the end of the buffer, it must be copied to the front to satisfy the aforementioned API spec. All of these are retained; the role of each is largely unchanged. The proposed design adds another parameter, the equator (while kvstart and bufstart could be replaced with a single variable similar to equator the effort seemed misspent). The record serialization moves "forward" in the buffer, while the metadata are allocated in 16 byte blocks in the opposite direction. This is illustrated in the following diagram: The role played by kvoffsets and kvindices is preserved; logically, particularly in the spill, each is interpreted in roughly the same way. In the new code, the allocation is not static, but will instead expand with the serialized records. This avoids degenerate cases for combiners and multilevel merges (though not necessarily optimal performance). Spills are triggered in two conditions: either the soft limit is reached (collection proceeds concurrently with the spill) or a record is large enough to require a spill before it can be written to the buffer (collection is blocked). In the former case is illustrated here: The equator is moved to an offset proportional to the average record size (caveats above ), kvindex is moved off the equator, aligned with the end of the array (int alignment, also so no metadata block will span the end of the array). Collection proceeds again from the equator, growing toward the ends of the spill. Should either run out of space, collection will block until the spill completes. Note that there is no partially written data when the soft limit is reached; it can only be triggered in collect, not in the blocking buffer. The other case to consider is when record data are partially written into the collection buffer, but the available space is exhausted: Here, the equator is moved to the beginning of the partial record and collection blocks. When the spill completes, the metadata are written off the equator and serialization of the record can continue. During collection, indices are adjusted only when holding a lock. As in the current code, the lock is only obtained in collect when one of the possible conditions for spilling may be satisfied. Since collect does not block, every serialized record performs a zero-length write into the buffer once both the key and value are written. This ensures that all the boundaries are checked and that collection will block if necessary. That's about it.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12422471/M64-3.patch
        against trunk revision 825469.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 19 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/180/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/180/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/180/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/180/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12422471/M64-3.patch against trunk revision 825469. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 19 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/180/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/180/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/180/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/180/console This message is automatically generated.
        Hide
        Todd Lipcon added a comment -

        Thanks for those changes.

        Regarding diagrams, I had an idea - one simple way might be to simply add TRACE level log messages at every collect() call with the current values of every index plus the spill number in simple TSV format. Feeding those into gnuplot or R to graph them should produce something reasonably informative without much manual diagramming. If you want, I can try to do this and post a patch - just let me know. My inspiration here was seekwatcher: http://oss.oracle.com/~mason/seekwatcher/ext3_vs_btrfs_vs_xfs.png

        Also, I remember when we were first looking at this last August, I ran a simple test where I was running a sort of 10 byte records, and it turned out that the "optimal" io.sort.record.percent caused my job to be significantly slower. It was the case then that a small number of large spills actually ran slower than a large number of small spills. Did we ever determine what that issue was? I think we should try to understand why the theory isn't agreeing with observations here.

        Show
        Todd Lipcon added a comment - Thanks for those changes. Regarding diagrams, I had an idea - one simple way might be to simply add TRACE level log messages at every collect() call with the current values of every index plus the spill number in simple TSV format. Feeding those into gnuplot or R to graph them should produce something reasonably informative without much manual diagramming. If you want, I can try to do this and post a patch - just let me know. My inspiration here was seekwatcher: http://oss.oracle.com/~mason/seekwatcher/ext3_vs_btrfs_vs_xfs.png Also, I remember when we were first looking at this last August, I ran a simple test where I was running a sort of 10 byte records, and it turned out that the "optimal" io.sort.record.percent caused my job to be significantly slower. It was the case then that a small number of large spills actually ran slower than a large number of small spills. Did we ever determine what that issue was? I think we should try to understand why the theory isn't agreeing with observations here.
        Hide
        Chris Douglas added a comment -
        • Changed mapreduce.map.sort.spill.percent description
        • Added a testcase randomizing spill percentage, key and value sizes, and record counts
        • Correctly used Job.COMPLETION_POLL_INTERVAL_KEY (the semantics of Configuration with Cluster and Job are a little odd...)
        • There's a case not properly handled in the last patch. It assumes that when kvstart == kvend that the spill is not running, but this doesn't hold when there is only one record in the spill. Instead of inferring the spill state from the indices, a boolean is both more legible and correct.

        Working on diagrams, etc.

        Show
        Chris Douglas added a comment - Changed mapreduce.map.sort.spill.percent description Added a testcase randomizing spill percentage, key and value sizes, and record counts Correctly used Job.COMPLETION_POLL_INTERVAL_KEY (the semantics of Configuration with Cluster and Job are a little odd...) There's a case not properly handled in the last patch. It assumes that when kvstart == kvend that the spill is not running, but this doesn't hold when there is only one record in the spill. Instead of inferring the spill state from the indices, a boolean is both more legible and correct. Working on diagrams, etc.
        Hide
        Chris Douglas added a comment -

        Good ideas; thanks, Todd.

        In the configuration parameter, there's the text "Note that this does not imply any chunking of data to the spill." - I'm not sure what this is trying to say [...]

        I was trying to say that, for example, a spill percentage of .3 doesn't mean that collection will attempt to make its spills at .3 * io.sort.mb. If the spill thread isn't done when the collection passes .6, it will happily continue collect until the buffer is completely full or the spill completes. I'll try to clarify the comment.

        Consider writing TestMapCollection as a true unit test rather than submitting jobs to the local cluster [...]

        If you do want to submit jobs to the local runner, set Job.COMPLETION_POLL_INTERVAL_KEY [...]

        This should be set to 100 in all the jobs (TestMapCollection::runtest(String, Job)), but the delays are still there... ah, but I'm setting it after the Job is created... OK, I see. I'll correct that. I agree that TestMapCollection would be better written with mock objects. Once MAPREDUCE-1050 is resolved, I'll look into it. Perhaps as a separate issue?

        suggested test: would be good to have a RecordFactory that generates records of randomized length to truly stress test the system. If we can figure out a test that generates random lengths but has output that's still verifiably correct (in terms of contents and not just total record count) that would be ideal. [...]

        This sounds close to TestMiniMRDFSSort, but a random case for the collection unit test makes sense.

        For the purpose of continued reviewing (and future documentation) I'd love to see some kind of diagram of the various (I count 9!) pointers into the data buffer. [...]

        There's some related ASCII art in HADOOP-2919, but I'll create some diagrams for this issue and attach them with a more thorough explanation.

        Thanks again for the notes.

        Show
        Chris Douglas added a comment - Good ideas; thanks, Todd. In the configuration parameter, there's the text "Note that this does not imply any chunking of data to the spill." - I'm not sure what this is trying to say [...] I was trying to say that, for example, a spill percentage of .3 doesn't mean that collection will attempt to make its spills at .3 * io.sort.mb. If the spill thread isn't done when the collection passes .6, it will happily continue collect until the buffer is completely full or the spill completes. I'll try to clarify the comment. Consider writing TestMapCollection as a true unit test rather than submitting jobs to the local cluster [...] If you do want to submit jobs to the local runner, set Job.COMPLETION_POLL_INTERVAL_KEY [...] This should be set to 100 in all the jobs (TestMapCollection::runtest(String, Job)), but the delays are still there... ah, but I'm setting it after the Job is created... OK, I see. I'll correct that. I agree that TestMapCollection would be better written with mock objects. Once MAPREDUCE-1050 is resolved, I'll look into it. Perhaps as a separate issue? suggested test: would be good to have a RecordFactory that generates records of randomized length to truly stress test the system. If we can figure out a test that generates random lengths but has output that's still verifiably correct (in terms of contents and not just total record count) that would be ideal. [...] This sounds close to TestMiniMRDFSSort, but a random case for the collection unit test makes sense. For the purpose of continued reviewing (and future documentation) I'd love to see some kind of diagram of the various (I count 9!) pointers into the data buffer. [...] There's some related ASCII art in HADOOP-2919 , but I'll create some diagrams for this issue and attach them with a more thorough explanation. Thanks again for the notes.
        Hide
        Todd Lipcon added a comment -

        Hi Chris,

        I started looking over this yesterday. I haven't had a chance to go fully through it, given how dense the code is, but here are a few of my initial thoughts:

        • In the configuration parameter, there's the text "Note that this does not imply any chunking of data to the spill." - I'm not sure what this is trying to say. I see that you copied it from the old description, but don't think it's very clear - if I don't understand it even with code level familiarity I doubt end users will.
        • Consider writing TestMapCollection as a true unit test rather than submitting jobs to the local cluster; Mockito can probably be helpful here. Or, separate out MapOutputBuffer into a static class (maybe even a new file?) so it's easier to test in isolation without mocking up as much stuff. I tend to prefer static inner classes (or non-inner classes) since their scope is more clearly identifiable, but I understand that's a personal preference.
        • If you do want to submit jobs to the local runner, set Job.COMPLETION_POLL_INTERVAL_KEY (mapreduce.client.completion.pollinterval) to 50ms or so instead of the default. This makes the test 4x faster on my system. You'll have to set this in a Configuration passed into new Job() in the tests, since it uses the cluster config, not the job config (I think this is a bug and I'll open a separate JIRA)
        • suggested test: would be good to have a RecordFactory that generates records of randomized length to truly stress test the system. If we can figure out a test that generates random lengths but has output that's still verifiably correct (in terms of contents and not just total record count) that would be ideal. One possibility is to calculate the product of the checksums of the individual records while generating, and verify that the product remains the same in the reducer. This would guard against a bug like a duplicated record output. If we had this as a separate (not run by default) stress/fuzz test which takes 20 minutes to run and throws all kinds of random data through the system, I think that would be fine.

        For the purpose of continued reviewing (and future documentation) I'd love to see some kind of diagram of the various (I count 9!) pointers into the data buffer. Do you think you could draw something up, or at least write up a sort of mini design doc either in this JIRA or in the code? Just a table of which points delineate what buffers, and what direction they grow in, would be great.

        Show
        Todd Lipcon added a comment - Hi Chris, I started looking over this yesterday. I haven't had a chance to go fully through it, given how dense the code is, but here are a few of my initial thoughts: In the configuration parameter, there's the text "Note that this does not imply any chunking of data to the spill." - I'm not sure what this is trying to say. I see that you copied it from the old description, but don't think it's very clear - if I don't understand it even with code level familiarity I doubt end users will. Consider writing TestMapCollection as a true unit test rather than submitting jobs to the local cluster; Mockito can probably be helpful here. Or, separate out MapOutputBuffer into a static class (maybe even a new file?) so it's easier to test in isolation without mocking up as much stuff. I tend to prefer static inner classes (or non-inner classes) since their scope is more clearly identifiable, but I understand that's a personal preference. If you do want to submit jobs to the local runner, set Job.COMPLETION_POLL_INTERVAL_KEY (mapreduce.client.completion.pollinterval) to 50ms or so instead of the default. This makes the test 4x faster on my system. You'll have to set this in a Configuration passed into new Job() in the tests, since it uses the cluster config, not the job config (I think this is a bug and I'll open a separate JIRA) suggested test: would be good to have a RecordFactory that generates records of randomized length to truly stress test the system. If we can figure out a test that generates random lengths but has output that's still verifiably correct (in terms of contents and not just total record count) that would be ideal. One possibility is to calculate the product of the checksums of the individual records while generating, and verify that the product remains the same in the reducer. This would guard against a bug like a duplicated record output. If we had this as a separate (not run by default) stress/fuzz test which takes 20 minutes to run and throws all kinds of random data through the system, I think that would be fine. For the purpose of continued reviewing (and future documentation) I'd love to see some kind of diagram of the various (I count 9!) pointers into the data buffer. Do you think you could draw something up, or at least write up a sort of mini design doc either in this JIRA or in the code? Just a table of which points delineate what buffers, and what direction they grow in, would be great.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12422203/M64-2.patch
        against trunk revision 825083.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 19 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/172/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/172/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/172/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/172/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12422203/M64-2.patch against trunk revision 825083. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 19 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/172/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/172/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/172/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/172/console This message is automatically generated.
        Hide
        Chris Douglas added a comment -

        Patch for review. Added comments, changed some debug messages to info.

        The fixed-size kvoffsets and kvindices arrays are replaced by per-record allocations into an IntBuffer overlay of the serialization buffer. The logic for the interlaced buffers is fundamentally the same as the existing code. Once the soft limit is reached, collection continues from the free space between the end of the meta and serialization, at an offset proportional to the average record size. This eliminates io.sort.record.percent and should use the space allocated within io.sort.mb more efficiently without configuration or study.

        Possibly controversial points:

        • After each record is serialized, there is a zero-length write from collect. This ensures that all the boundaries are checked and that the collection thread blocks when it is out of space. While it would be possible to block in collect, keeping the logic separate was cleaner and should impose no real penalty.
        • Once the soft limit is reached, at least half the free space is left for serialization data, but this makes no similar accommodation for metadata. Occasional, extremely large records may harm concurrency by skewing the average, as the average record size is based on the counters (i.e. the full task duration)
        • Dropped final on several variables and methods to avoid penalties for auto-generated accessors; also dropped volatile where unnecessary.
        • Suppressed some findbugs warnings. The inconsistent sync warnings should be spurious, as the variables are only referenced in the collection thread. The unreleased lock exception warning appeared without modifying the spill thread, but I moved the check back outside that thread anyway.
        • The TestMapCollection unit test ported to the new API takes longer. The old test runs in near the same duration on my machine, so I suspect this is probably an unrelated delay.
        Show
        Chris Douglas added a comment - Patch for review. Added comments, changed some debug messages to info. The fixed-size kvoffsets and kvindices arrays are replaced by per-record allocations into an IntBuffer overlay of the serialization buffer. The logic for the interlaced buffers is fundamentally the same as the existing code. Once the soft limit is reached, collection continues from the free space between the end of the meta and serialization, at an offset proportional to the average record size. This eliminates io.sort.record.percent and should use the space allocated within io.sort.mb more efficiently without configuration or study. Possibly controversial points: After each record is serialized, there is a zero-length write from collect. This ensures that all the boundaries are checked and that the collection thread blocks when it is out of space. While it would be possible to block in collect, keeping the logic separate was cleaner and should impose no real penalty. Once the soft limit is reached, at least half the free space is left for serialization data, but this makes no similar accommodation for metadata. Occasional, extremely large records may harm concurrency by skewing the average, as the average record size is based on the counters (i.e. the full task duration) Dropped final on several variables and methods to avoid penalties for auto-generated accessors; also dropped volatile where unnecessary. Suppressed some findbugs warnings. The inconsistent sync warnings should be spurious, as the variables are only referenced in the collection thread. The unreleased lock exception warning appeared without modifying the spill thread, but I moved the check back outside that thread anyway. The TestMapCollection unit test ported to the new API takes longer. The old test runs in near the same duration on my machine, so I suspect this is probably an unrelated delay.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12421799/M64-1.patch
        against trunk revision 823227.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 19 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/156/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/156/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/156/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/156/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12421799/M64-1.patch against trunk revision 823227. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 19 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/156/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/156/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/156/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/156/console This message is automatically generated.
        Hide
        Chris Douglas added a comment -

        Fixed bugs, most notably: records that produce no writes (like NullWritable) won't block when the buffer is full, so metadata could overwrite serialization data. Modified a unit test to validate this. While logic to block could be pushed back into collect (as it was before), a zero-length write accomplishes the same thing, keeps things a little more comprehensible, and is a negligible amount of work.

        Continuing testing.

        Show
        Chris Douglas added a comment - Fixed bugs, most notably: records that produce no writes (like NullWritable) won't block when the buffer is full, so metadata could overwrite serialization data. Modified a unit test to validate this. While logic to block could be pushed back into collect (as it was before), a zero-length write accomplishes the same thing, keeps things a little more comprehensible, and is a negligible amount of work. Continuing testing.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12421173/M64-0.patch
        against trunk revision 819740.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 19 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/139/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/139/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/139/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/139/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12421173/M64-0.patch against trunk revision 819740. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 19 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/139/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/139/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/139/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/139/console This message is automatically generated.
        Hide
        Chris Douglas added a comment -

        Preliminary patch, merged with trunk. Still testing, evaluating performance, and verifying correctness.

        This modifies the existing sort as follows. The kvoffsets and kvindices arrays are merged and interlaced into the serialization buffer, read/written through an IntBuffer overlay. All collection proceeds from an "equator"; record serialization moves "forward" in the buffer while metadata moves "backward," both as circular buffers as in the existing design. The "io.sort.record.percent" parameter is no longer used. The spill percentage still controls the soft limit, which resets the equator into the unused part of the buffer at an offset proportional to the average record size.

        Show
        Chris Douglas added a comment - Preliminary patch, merged with trunk. Still testing, evaluating performance, and verifying correctness. This modifies the existing sort as follows. The kvoffsets and kvindices arrays are merged and interlaced into the serialization buffer, read/written through an IntBuffer overlay. All collection proceeds from an "equator"; record serialization moves "forward" in the buffer while metadata moves "backward," both as circular buffers as in the existing design. The "io.sort.record.percent" parameter is no longer used. The spill percentage still controls the soft limit, which resets the equator into the unused part of the buffer at an offset proportional to the average record size.
        Hide
        Chris Douglas added a comment -

        I didn't have time to finish this for 0.21, but- as I told Todd in early August- I don't have a lock on the issue. If someone wants to take a crack at this while I'm prevented then that's their business.

        Show
        Chris Douglas added a comment - I didn't have time to finish this for 0.21, but- as I told Todd in early August- I don't have a lock on the issue. If someone wants to take a crack at this while I'm prevented then that's their business.
        Hide
        Jeff Hammerbacher added a comment -

        Hey Chris,

        Any progress on this patch? We've held off on working on it, since you mentioned you had some work already done. Should we continue to hold off?

        Thanks,
        Jeff

        Show
        Jeff Hammerbacher added a comment - Hey Chris, Any progress on this patch? We've held off on working on it, since you mentioned you had some work already done. Should we continue to hold off? Thanks, Jeff
        Hide
        Todd Lipcon added a comment -

        Any chance you could upload what you've got? For some reason I have an itch to hack on this this weekend

        Show
        Todd Lipcon added a comment - Any chance you could upload what you've got? For some reason I have an itch to hack on this this weekend
        Hide
        Chris Douglas added a comment -

        I was working on a patch, but got pulled into other things. It's close enough for me to finish if I can find some time in the next couple weeks.

        Show
        Chris Douglas added a comment - I was working on a patch, but got pulled into other things. It's close enough for me to finish if I can find some time in the next couple weeks.
        Hide
        Todd Lipcon added a comment -

        Hi Arun,

        Have you guys worked on this at all already? I'm interested in playing around with rewriting part of the mapside sort to get rid of this tunable. Like you said, for a lot of applications the default values are way off. 350K records in 95MB = 271 bytes average record size, which is larger than probably the majority of jobs we see in practice. If you already have worked on this I don't want to duplicate your effort, but if not, I think it would be a good step towards better average performance
        without expert tuning.

        Show
        Todd Lipcon added a comment - Hi Arun, Have you guys worked on this at all already? I'm interested in playing around with rewriting part of the mapside sort to get rid of this tunable. Like you said, for a lot of applications the default values are way off. 350K records in 95MB = 271 bytes average record size, which is larger than probably the majority of jobs we see in practice. If you already have worked on this I don't want to duplicate your effort, but if not, I think it would be a good step towards better average performance without expert tuning.

          People

          • Assignee:
            Chris Douglas
            Reporter:
            Arun C Murthy
          • Votes:
            0 Vote for this issue
            Watchers:
            24 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development