Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1248

Redundant memory copying in StreamKeyValUtil

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.22.0
    • Component/s: contrib/streaming
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      I found that when MROutputThread collecting the output of Reducer, it calls StreamKeyValUtil.splitKeyVal() and two local byte-arrays are allocated there for each line of output. Later these two byte-arrays are passed to variable key and val. There are twice memory copying here, one is the System.arraycopy() method, the other is inside key.set() / val.set().

      This causes double times of memory copying for the whole output (may lead to higher CPU consumption), and frequent temporay object allocation.

        Activity

          People

          • Assignee:
            Ruibang He
            Reporter:
            Ruibang He
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development