Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1248

Redundant memory copying in StreamKeyValUtil

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.22.0
    • Component/s: contrib/streaming
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      I found that when MROutputThread collecting the output of Reducer, it calls StreamKeyValUtil.splitKeyVal() and two local byte-arrays are allocated there for each line of output. Later these two byte-arrays are passed to variable key and val. There are twice memory copying here, one is the System.arraycopy() method, the other is inside key.set() / val.set().

      This causes double times of memory copying for the whole output (may lead to higher CPU consumption), and frequent temporay object allocation.

        Activity

        Ruibang He created issue -
        Hide
        Ruibang He added a comment -

        I often observed the memory consumption in the reduce phase of Reducers go up to heap limit and fall down repeatly. This phenomenon is often caused by frequent temporay object allocation. This is an impact to performance, regarding GC has to keep working constantly.

        Show
        Ruibang He added a comment - I often observed the memory consumption in the reduce phase of Reducers go up to heap limit and fall down repeatly. This phenomenon is often caused by frequent temporay object allocation. This is an impact to performance, regarding GC has to keep working constantly.
        Hide
        Ruibang He added a comment -

        I suggest to remove the two local byte-arrays, and replace the following code:

        key.set(keyBytes);
        val.set(valBytes);

        with:

        key.set(utf, start, keyLen);
        val.set(utf, splitPos+separatorLength, valLen);

        I have simply tested the above in my cluster. It works and the momery stops keeping going up.

        Any thoughts?

        Show
        Ruibang He added a comment - I suggest to remove the two local byte-arrays, and replace the following code: key.set(keyBytes); val.set(valBytes); with: key.set(utf, start, keyLen); val.set(utf, splitPos+separatorLength, valLen); I have simply tested the above in my cluster. It works and the momery stops keeping going up. Any thoughts?
        Hide
        ZhuGuanyin added a comment -

        the same thing happenes in KeyValueLineRecordReader.java, when it calles the next() method.

        Show
        ZhuGuanyin added a comment - the same thing happenes in KeyValueLineRecordReader.java, when it calles the next() method.
        Hide
        Ruibang He added a comment -

        Thanks, Guanyin. The lastest trunk has fixed the problem in KeyValueLineRecordReader.java, but in StreamKeyValUtil.java this problem still exists. Patch is attached for an early solution.

        Show
        Ruibang He added a comment - Thanks, Guanyin. The lastest trunk has fixed the problem in KeyValueLineRecordReader.java, but in StreamKeyValUtil.java this problem still exists. Patch is attached for an early solution.
        Hide
        Ruibang He added a comment -

        An early solution

        Show
        Ruibang He added a comment - An early solution
        Ruibang He made changes -
        Field Original Value New Value
        Attachment MAPREDUCE-1248-v1.0.patch [ 12426511 ]
        Hide
        Amareshwari Sriramadasu added a comment -

        Patch looks good.
        Submitting for hudson.

        Show
        Amareshwari Sriramadasu added a comment - Patch looks good. Submitting for hudson.
        Amareshwari Sriramadasu made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12426511/MAPREDUCE-1248-v1.0.patch
        against trunk revision 960808.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        -1 javac. The patch appears to cause tar ant target to fail.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/592/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/592/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/592/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/592/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12426511/MAPREDUCE-1248-v1.0.patch against trunk revision 960808. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/592/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/592/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/592/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/592/console This message is automatically generated.
        Hide
        Amareshwari Sriramadasu added a comment -

        -1 contrib tests.

        Is due to MAPREDUCE-1834 and MAPREDUCE-1375.

        javac warnings failure needs investigation.

        Show
        Amareshwari Sriramadasu added a comment - -1 contrib tests. Is due to MAPREDUCE-1834 and MAPREDUCE-1375 . javac warnings failure needs investigation.
        Hide
        Amareshwari Sriramadasu added a comment -

        Could not figure out the javac error from console output. Re-ran test-patch on my local machine, there are no javac warnings.
        test-patch result:

             [exec]
             [exec] +1 overall.
             [exec]
             [exec]     +1 @author.  The patch does not contain any @author tags.
             [exec]
             [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
             [exec]                         Please justify why no new tests are needed for this patch.
             [exec]                         Also please list what manual steps were performed to verify this patch.
             [exec]
             [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
             [exec]
             [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
             [exec]
             [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
             [exec]
             [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
        
        Show
        Amareshwari Sriramadasu added a comment - Could not figure out the javac error from console output. Re-ran test-patch on my local machine, there are no javac warnings. test-patch result: [exec] [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
        Hide
        Amareshwari Sriramadasu added a comment -

        I just committed this. Thanks Ruibang !

        Show
        Amareshwari Sriramadasu added a comment - I just committed this. Thanks Ruibang !
        Amareshwari Sriramadasu made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Fix Version/s 0.22.0 [ 12314184 ]
        Resolution Fixed [ 1 ]
        Hide
        Ruibang He added a comment -

        You're welcome, Amareshwari

        Show
        Ruibang He added a comment - You're welcome, Amareshwari
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #523 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/523/)

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #523 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/523/ )
        Konstantin Shvachko made changes -
        Assignee Ruibang He [ ruibang ]
        Affects Version/s 0.22.0 [ 12314184 ]

          People

          • Assignee:
            Ruibang He
            Reporter:
            Ruibang He
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development