Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Later
-
None
-
None
-
None
Description
PySpark's shuffle typically shuffles Java RDDs that contain byte arrays. We should implement a custom Serializer for use in these shuffles. This will allow us to take advantage of shuffle optimizations like SPARK-7311 for PySpark without requiring users to change the default serializer to KryoSerializer (this is useful for JobServer-type applications).
Attachments
Issue Links
- relates to
-
SPARK-3132 Avoid serialization for Array[Byte] in TorrentBroadcast
- Closed