Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
3.3.0
Description
In TorrentBroadcast.writeBlocks we store the unserialized broadcast object in addition to the serialized version of it -
private def writeBlocks(value: T): Int = { import StorageLevel._ // Store a copy of the broadcast variable in the driver so that tasks run on the driver // do not create a duplicate copy of the broadcast variable's value. val blockManager = SparkEnv.get.blockManager if (!blockManager.putSingle(broadcastId, value, MEMORY_AND_DISK, tellMaster = false)) { throw new SparkException(s"Failed to store $broadcastId in BlockManager") }
In case of broadcast relations, these objects can be fairly large (60MB in one observed case) and are not strictly necessary on the driver.
Add the option to not keep the unserialized versions of the objects.