Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2910

Allow empty MapOutputFile segments

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 0.20.2, 0.23.0
    • None
    • task, tasktracker
    • None

    Description

      As the scale of cluster and job get larger, we see a lot of empty partitions in MapOutputFile due to large reduce numbers or partition skew. When map output compression is enabled, empty map output partitions gets larger & has additional compressor/decompressor initialization overhead.
      This can be optimized by allowing empty MapOutputFile segments, where the rawLength & partLength of IndexRecord all equal to 0. Corresponding support need to be added to IFile reader, writer, and reduce shuffle copier.

      Attachments

        Activity

          People

            Unassigned Unassigned
            decster Binglin Chang
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: