Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1248

Redundant memory copying in StreamKeyValUtil

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.22.0
    • Component/s: contrib/streaming
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      I found that when MROutputThread collecting the output of Reducer, it calls StreamKeyValUtil.splitKeyVal() and two local byte-arrays are allocated there for each line of output. Later these two byte-arrays are passed to variable key and val. There are twice memory copying here, one is the System.arraycopy() method, the other is inside key.set() / val.set().

      This causes double times of memory copying for the whole output (may lead to higher CPU consumption), and frequent temporay object allocation.

        Attachments

          Activity

            People

            • Assignee:
              ruibang Ruibang He
              Reporter:
              ruibang Ruibang He
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: