Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
Description
In BypassMergeSortShuffleWriter, we may end up opening disk writers files for empty partitions; this occurs because we manually call open() after creating the writer, causing serialization and compression input streams to be created; these streams may write headers to the output stream, resulting in non-zero-length files being created for partitions that contain no records. This is unnecessary, though, since the disk object writer will automatically open itself when the first write is performed. Removing this eager open() call and rewriting the consumers to cope with the non-existence of empty files results in a large performance benefit for certain sparse workloads when using sort-based shuffle.
Attachments
Issue Links
- duplicates
-
SPARK-11225 Prevent generate empty file
- Resolved
- is duplicated by
-
SPARK-12400 Avoid writing a shuffle file if a partition has no output (empty)
- Resolved
- links to