Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-3226

Run combiner when merging spills from map output

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.18.0
    • None
    • None
    • Incompatible change, Reviewed
    • Hide
      Changed policy for running combiner. The combiner may be run multiple times as the map's output is sorted and merged. Additionally, it may be run on the reduce side as data is merged. The old semantics are available in Hadoop 0.18 if the user calls:
      job.setCombineOnlyOnce(true);
      Show
      Changed policy for running combiner. The combiner may be run multiple times as the map's output is sorted and merged. Additionally, it may be run on the reduce side as data is merged. The old semantics are available in Hadoop 0.18 if the user calls: job.setCombineOnlyOnce(true);

    Description

      When merging spills from the map, running the combiner should further diminish the volume of data we send to the reduce.

      Attachments

        1. 3226-0.patch
          2 kB
          Christopher Douglas
        2. 3226-1.patch
          15 kB
          Christopher Douglas
        3. 3226-2.patch
          15 kB
          Christopher Douglas
        4. 3226-3.patch
          13 kB
          Christopher Douglas

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            cdouglas Christopher Douglas
            cdouglas Christopher Douglas
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment