Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
0.20.2, 0.23.0
-
None
-
None
Description
As the scale of cluster and job get larger, we see a lot of empty partitions in MapOutputFile due to large reduce numbers or partition skew. When map output compression is enabled, empty map output partitions gets larger & has additional compressor/decompressor initialization overhead.
This can be optimized by allowing empty MapOutputFile segments, where the rawLength & partLength of IndexRecord all equal to 0. Corresponding support need to be added to IFile reader, writer, and reduce shuffle copier.