Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1248

Redundant memory copying in StreamKeyValUtil

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 0.22.0
    • 0.22.0
    • contrib/streaming
    • None
    • Reviewed

    Description

      I found that when MROutputThread collecting the output of Reducer, it calls StreamKeyValUtil.splitKeyVal() and two local byte-arrays are allocated there for each line of output. Later these two byte-arrays are passed to variable key and val. There are twice memory copying here, one is the System.arraycopy() method, the other is inside key.set() / val.set().

      This causes double times of memory copying for the whole output (may lead to higher CPU consumption), and frequent temporay object allocation.

      Attachments

        1. MAPREDUCE-1248-v1.0.patch
          1.0 kB
          Ruibang He

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            ruibang Ruibang He
            ruibang Ruibang He
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment