Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1182

Reducers fail with OutOfMemoryError while copying Map outputs

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.20.2
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Modifies shuffle related memory parameters to use 'long' from 'int' so that sizes greater than maximum integer size are handled correctly
    • Tags:
      OutOfMemoryError, OOM reducer

      Description

      Reducers fail while copying Map outputs with following exception

      java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1539) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1432) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1285) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1216) ,Error:

      Reducer's memory usage keeps on increasing and ultimately exceeds -Xmx value
      I even tried with -Xmx6.5g to each reducer but it's still failing

      While looking into the reducer logs, I found that reducers were doing shuffleInMemory every time, rather than doing shuffleOnDisk

      1. M1182-1.patch
        4 kB
        Chris Douglas
      2. M1182-1v20.patch
        2 kB
        Chris Douglas
      3. M1182-0.patch
        2 kB
        Chris Douglas
      4. M1182-0v20.patch
        2 kB
        Chris Douglas
      5. HADOOP-6357.patch
        201 kB
        Chandra Prakash Bhagtani

        Activity

        Hide
        Chandra Prakash Bhagtani added a comment -

        I ran the job with CDH2 (hadoop 0.20.1+133)

        Show
        Chandra Prakash Bhagtani added a comment - I ran the job with CDH2 (hadoop 0.20.1+133)
        Hide
        Chandra Prakash Bhagtani added a comment -

        The problem was related to java int usein ReducerTask ShuffleRamManager reserve method check-
        // Wait till the request can be fulfilled...
        while ((size + requestedSize) > maxSize) {

        The check fails if (size+requestedSize) exceeds Integer.MAX_VALUE and "wraps around" into a negative value thus failing the check. This forces all subsequent requests to keep on reserving the RAM and finally crash the JVM.

        My fix is: while (((long)size + (long)requestedSize) > maxSize) {

        It worked!!!!!

        Show
        Chandra Prakash Bhagtani added a comment - The problem was related to java int usein ReducerTask ShuffleRamManager reserve method check- // Wait till the request can be fulfilled... while ((size + requestedSize) > maxSize) { The check fails if (size+requestedSize) exceeds Integer.MAX_VALUE and "wraps around" into a negative value thus failing the check. This forces all subsequent requests to keep on reserving the RAM and finally crash the JVM. My fix is: while (((long)size + (long)requestedSize) > maxSize) { It worked!!!!!
        Hide
        Amareshwari Sriramadasu added a comment -

        Can you give more details like the values for configuration parameters such as mapred.job.shuffle.input.buffer.percent and mapred.inmem.merge.threshold, if you are not using default values?

        Show
        Amareshwari Sriramadasu added a comment - Can you give more details like the values for configuration parameters such as mapred.job.shuffle.input.buffer.percent and mapred.inmem.merge.threshold, if you are not using default values?
        Hide
        Amareshwari Sriramadasu added a comment -

        Sorry, didn't see you previous comment. Please ignore my last comment.

        Show
        Amareshwari Sriramadasu added a comment - Sorry, didn't see you previous comment. Please ignore my last comment.
        Hide
        Chandra Prakash Bhagtani added a comment -

        this patch is against hadoop-0.20.0 release

        Show
        Chandra Prakash Bhagtani added a comment - this patch is against hadoop-0.20.0 release
        Hide
        Amar Kamat added a comment -

        Amazing!!

        Show
        Amar Kamat added a comment - Amazing!!
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12423911/HADOOP-6357.patch
        against trunk revision 832249.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 patch. The patch command could not apply the patch.

        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/124/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12423911/HADOOP-6357.patch against trunk revision 832249. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/124/console This message is automatically generated.
        Hide
        Arun C Murthy added a comment -

        The patch is generated incorrectly, we also need to fix this in trunk.

        Show
        Arun C Murthy added a comment - The patch is generated incorrectly, we also need to fix this in trunk.
        Hide
        Chris Douglas added a comment -

        Patches changing shuffle arithmetic to use longs instead of ints. Retains the restriction on in-memory segments to maxint, though lifting that constraint can/should be explored in another issue.

        Including unit tests for this is impractical, but it will be tested manually.

        Show
        Chris Douglas added a comment - Patches changing shuffle arithmetic to use longs instead of ints. Retains the restriction on in-memory segments to maxint, though lifting that constraint can/should be explored in another issue. Including unit tests for this is impractical, but it will be tested manually.
        Hide
        Chris Douglas added a comment -

        (arranging patches for Hudson)

        Show
        Chris Douglas added a comment - (arranging patches for Hudson)
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12424074/M1182-0.patch
        against trunk revision 832362.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/125/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/125/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/125/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/125/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12424074/M1182-0.patch against trunk revision 832362. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/125/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/125/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/125/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/125/console This message is automatically generated.
        Hide
        Jothi Padmanabhan added a comment -

        Minor nits – TestReduceFetch* is still using job.setInt to set JobContext.REDUCE_MEMORY_TOTAL_BYTES instead of job.setLong. Also, in 20, it looks like the method getMemoryLimit is not getting called at all. Should we remove that?

        Show
        Jothi Padmanabhan added a comment - Minor nits – TestReduceFetch* is still using job.setInt to set JobContext.REDUCE_MEMORY_TOTAL_BYTES instead of job.setLong . Also, in 20, it looks like the method getMemoryLimit is not getting called at all. Should we remove that?
        Hide
        Chris Douglas added a comment -

        Addressed Jothi's feedback. Thanks for the review

        Show
        Chris Douglas added a comment - Addressed Jothi's feedback. Thanks for the review
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12424117/M1182-1.patch
        against trunk revision 833006.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 6 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/128/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/128/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/128/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/128/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12424117/M1182-1.patch against trunk revision 833006. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/128/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/128/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/128/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/128/console This message is automatically generated.
        Hide
        Jothi Padmanabhan added a comment -

        +1

        Show
        Jothi Padmanabhan added a comment - +1
        Hide
        Amareshwari Sriramadasu added a comment -

        I could reproduce the OOM on Yahoo! distribution by running loadgen. Verified with patch M1182-1v20.patch, the issue no longer exists.

        Show
        Amareshwari Sriramadasu added a comment - I could reproduce the OOM on Yahoo! distribution by running loadgen. Verified with patch M1182-1v20.patch, the issue no longer exists.
        Hide
        Chris Douglas added a comment -

        I committed this. Thanks, Chandra!

        Show
        Chris Douglas added a comment - I committed this. Thanks, Chandra!
        Hide
        Chris Douglas added a comment -

        Thanks to Amareshwari for testing this at scale

        Show
        Chris Douglas added a comment - Thanks to Amareshwari for testing this at scale
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #134 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/134/)

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #134 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/134/ )
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #162 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/162/)

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #162 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/162/ )

          People

          • Assignee:
            Chandra Prakash Bhagtani
            Reporter:
            Chandra Prakash Bhagtani
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development