Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 0.20.2, 0.23.0
    • Fix Version/s: None
    • Component/s: task, tasktracker
    • Labels:
      None

      Description

      As the scale of cluster and job get larger, we see a lot of empty partitions in MapOutputFile due to large reduce numbers or partition skew. When map output compression is enabled, empty map output partitions gets larger & has additional compressor/decompressor initialization overhead.
      This can be optimized by allowing empty MapOutputFile segments, where the rawLength & partLength of IndexRecord all equal to 0. Corresponding support need to be added to IFile reader, writer, and reduce shuffle copier.

        Activity

        Allen Wittenauer made changes -
        Fix Version/s 0.24.0 [ 12317654 ]
        Arun C Murthy made changes -
        Field Original Value New Value
        Fix Version/s 0.24.0 [ 12317654 ]
        Fix Version/s 0.23.0 [ 12315570 ]
        Binglin Chang created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Binglin Chang
          • Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development