Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1248

Redundant memory copying in StreamKeyValUtil

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.22.0
    • Component/s: contrib/streaming
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      I found that when MROutputThread collecting the output of Reducer, it calls StreamKeyValUtil.splitKeyVal() and two local byte-arrays are allocated there for each line of output. Later these two byte-arrays are passed to variable key and val. There are twice memory copying here, one is the System.arraycopy() method, the other is inside key.set() / val.set().

      This causes double times of memory copying for the whole output (may lead to higher CPU consumption), and frequent temporay object allocation.

        Activity

        Ruibang He created issue -
        Ruibang He made changes -
        Field Original Value New Value
        Attachment MAPREDUCE-1248-v1.0.patch [ 12426511 ]
        Amareshwari Sriramadasu made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Amareshwari Sriramadasu made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Fix Version/s 0.22.0 [ 12314184 ]
        Resolution Fixed [ 1 ]
        Konstantin Shvachko made changes -
        Assignee Ruibang He [ ruibang ]
        Affects Version/s 0.22.0 [ 12314184 ]

          People

          • Assignee:
            Ruibang He
            Reporter:
            Ruibang He
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development