Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
TorrentBroadcast is unnecessarily complicated:
1. It tracks a lot of mutable states, such as total number of bytes, number of blocks fetched.
2. It has at least two data structures that are not needed: TorrentInfo and TorrentBlock.
3. It uses getSingle on executors to get the block instead of getLocal, resulting in an extra roundtrip to look up the location of the block when the block doesn't exist yet.
4. It has a metadata block that is completely unnecessary.
5. It does an extra memory copy during deserialization to copy all the blocks into a single giant array.
Attachments
Issue Links
- is related to
-
SPARK-3115 Improve task broadcast latency for small tasks
- Resolved
- links to