Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4961

Map reduce running local should also go through ShuffleConsumerPlugin for enabling different MergeManager implementations

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      MAPREDUCE-4049 provide the ability for pluggable Shuffle and MAPREDUCE-4080 extends Shuffle to be able to provide different MergeManager implementations.

      While using these pluggable features, I find that when a map reduce is running locally, a RawKeyValueIterator was returned directly from a static call of Merge.merge, which break the assumption that the Shuffle may provide different merge methods although there is no copy phase for this situation.

      The use case is when I am implementating a hash-based MergeManager, we don't need sort in map side, while when running the map reduce locally, the hash-based MergeManager will have no chance to be used as it goes directly to Merger.merge. This makes the pluggable Shuffle and MergeManager incomplete.

      So we need to move the code calling Merger.merge from Reduce Task to ShuffleConsumerPlugin implementation, so that the Suffle implementation can decide how to do the merge and return corresponding iterator.

      Attachments

        1. MAPREDUCE-4961.patch
          25 kB
          Haifeng Chen
        2. MAPREDUCE-4961.patch
          16 kB
          Haifeng Chen

        Activity

          People

            jerrychenhf Haifeng Chen
            jerrychenhf Haifeng Chen
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 72h
                72h
                Remaining:
                Remaining Estimate - 72h
                72h
                Logged:
                Time Spent - Not Specified
                Not Specified