Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1287

Avoid calling Partitioner with only 1 reducer

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      For jobs with only one reducer, the Partitioner will no longer be called. Applications depending on Partitioners modifying records for single reducer jobs will need to move this functionality elsewhere.

      Description

      Partitioners are currently called for each record, even though all are destined for the same reduce.

      1. M1287-4.patch
        2 kB
        Chris Douglas
      2. M1287-5.patch
        2 kB
        Chris Douglas
      3. M1287-6.patch
        7 kB
        Chris Douglas
      4. MAPREDUCE-1287.2.patch
        1 kB
        Ed Mazur
      5. MAPREDUCE-1287.3.patch
        1 kB
        Ed Mazur
      6. MAPREDUCE-1287.patch
        0.7 kB
        Ed Mazur

        Issue Links

          Activity

          Hide
          Ed Mazur added a comment -

          Trivial patch. Checks for the simple case of 1 reducer.

          Show
          Ed Mazur added a comment - Trivial patch. Checks for the simple case of 1 reducer.
          Hide
          Todd Lipcon added a comment -
          • Can you also make this change to the old API HashPartitioner? src/java/org/apache/hadoop/mapred/lib/HashPartitioner.java
          • The "else" on a separate line from the '}' is different style than the usual Hadoop style. Also a space after 'if' is usual style.
          Show
          Todd Lipcon added a comment - Can you also make this change to the old API HashPartitioner? src/java/org/apache/hadoop/mapred/lib/HashPartitioner.java The "else" on a separate line from the '}' is different style than the usual Hadoop style. Also a space after 'if' is usual style.
          Hide
          Ed Mazur added a comment -
          • Changes old API HashPartitioner as well
          • Corrected formatting
          Show
          Ed Mazur added a comment - Changes old API HashPartitioner as well Corrected formatting
          Hide
          Ed Mazur added a comment -

          Forgot about the space after "if".

          Show
          Ed Mazur added a comment - Forgot about the space after "if".
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12427524/MAPREDUCE-1287.3.patch
          against trunk revision 888761.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/312/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/312/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/312/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/312/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12427524/MAPREDUCE-1287.3.patch against trunk revision 888761. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/312/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/312/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/312/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/312/console This message is automatically generated.
          Hide
          Tom White added a comment -

          What size of performance gain does this change give?

          This might be better done in the framework, by using a special partitioner in the single reduce case. A class called, say, SinglePartitionPartitioner whose getPartition() method always returns 0.

          Show
          Tom White added a comment - What size of performance gain does this change give? This might be better done in the framework, by using a special partitioner in the single reduce case. A class called, say, SinglePartitionPartitioner whose getPartition() method always returns 0.
          Hide
          Ed Mazur added a comment -

          I haven't ran this, but here's a quick analysis:

          • Cost: (number of map output pairs)*(cost of "reducers == 1" check)
          • Gain: (number of map output pairs)*(cost of key's hashCode()), but only in the case of 1 reducer (no gain otherwise)

          Your suggestion of moving this into the framework makes a lot of sense. That way you only have to check for the 1 reducer case when you assign the partitioner and not for every map output, essentially eliminating the cost of the optimization.

          Show
          Ed Mazur added a comment - I haven't ran this, but here's a quick analysis: Cost: ( number of map output pairs )*( cost of "reducers == 1" check ) Gain: ( number of map output pairs )*( cost of key's hashCode() ), but only in the case of 1 reducer (no gain otherwise) Your suggestion of moving this into the framework makes a lot of sense. That way you only have to check for the 1 reducer case when you assign the partitioner and not for every map output, essentially eliminating the cost of the optimization.
          Hide
          Chris Douglas added a comment -

          I agree with Tom; this belongs in the framework if it's a legal optimization.

          Should this be marked as an incompatible change, since the partitioner is always called now? Clearly, any application that depends on the partitioner for correctness can be rewritten, but is it worth calling out?

          Show
          Chris Douglas added a comment - I agree with Tom; this belongs in the framework if it's a legal optimization. Should this be marked as an incompatible change, since the partitioner is always called now? Clearly, any application that depends on the partitioner for correctness can be rewritten, but is it worth calling out?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12428502/M1287-4.patch
          against trunk revision 892479.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/228/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/228/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/228/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/228/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428502/M1287-4.patch against trunk revision 892479. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/228/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/228/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/228/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/228/console This message is automatically generated.
          Hide
          Tom White added a comment -

          Any reason that the old partitioner uses 1 - numPartitions and the new one uses partitions - 1? It shouldn't make any difference since the partitioner is not actually used in the zero partition case, but it would be good to make the code consistent.

          > Clearly, any application that depends on the partitioner for correctness can be rewritten, but is it worth calling out?

          I think so - put a comment in the release notes.

          Show
          Tom White added a comment - Any reason that the old partitioner uses 1 - numPartitions and the new one uses partitions - 1 ? It shouldn't make any difference since the partitioner is not actually used in the zero partition case, but it would be good to make the code consistent. > Clearly, any application that depends on the partitioner for correctness can be rewritten, but is it worth calling out? I think so - put a comment in the release notes.
          Hide
          Chris Douglas added a comment -

          Sorry, I forgot to put a watch on this issue.

          Any reason that the old partitioner uses 1 - numPartitions and the new one uses partitions - 1?

          No, that was careless; thanks. Updated patch.

          Show
          Chris Douglas added a comment - Sorry, I forgot to put a watch on this issue. Any reason that the old partitioner uses 1 - numPartitions and the new one uses partitions - 1? No, that was careless; thanks. Updated patch.
          Hide
          Chris Douglas added a comment -

          Moved bad partitioner unit test to TestMiniMRDFSSort to run on a MiniMRCluster supporting >1 reducer.

          Show
          Chris Douglas added a comment - Moved bad partitioner unit test to TestMiniMRDFSSort to run on a MiniMRCluster supporting >1 reducer.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12429728/M1287-5.patch
          against trunk revision 897118.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/255/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/255/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/255/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/255/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12429728/M1287-5.patch against trunk revision 897118. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/255/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/255/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/255/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/255/console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12429732/M1287-6.patch
          against trunk revision 897118.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/366/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/366/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/366/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/366/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12429732/M1287-6.patch against trunk revision 897118. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/366/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/366/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/366/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/366/console This message is automatically generated.
          Hide
          Tom White added a comment -

          +1

          Show
          Tom White added a comment - +1
          Hide
          Chris Douglas added a comment -

          That's troubling. M1287-5.patch should have caused a failure on Hudson. I confirmed on my machine and it fails there, as well...

          Show
          Chris Douglas added a comment - That's troubling. M1287-5.patch should have caused a failure on Hudson. I confirmed on my machine and it fails there, as well...
          Hide
          Chris Douglas added a comment -

          I committed this.

          Show
          Chris Douglas added a comment - I committed this.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #216 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/216/)
          . Only call the partitioner with more than one reducer.

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #216 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/216/ ) . Only call the partitioner with more than one reducer.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #225 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/225/)

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #225 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/225/ )

            People

            • Assignee:
              Chris Douglas
              Reporter:
              Ed Mazur
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development