Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 0.20.2, 0.23.0
    • Fix Version/s: None
    • Component/s: task, tasktracker
    • Labels:
      None

      Description

      As the scale of cluster and job get larger, we see a lot of empty partitions in MapOutputFile due to large reduce numbers or partition skew. When map output compression is enabled, empty map output partitions gets larger & has additional compressor/decompressor initialization overhead.
      This can be optimized by allowing empty MapOutputFile segments, where the rawLength & partLength of IndexRecord all equal to 0. Corresponding support need to be added to IFile reader, writer, and reduce shuffle copier.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              decster Binglin Chang
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: