TorrentBroadcast uses a couple of "back doors" into the block manager to write and read data:
The thing these block manager methods have in common is that they bypass the encryption code; so broadcast data is stored unencrypted in the block manager, causing unencrypted data to be written to disk if those blocks need to be evicted from memory.
The correct fix here is actually not to change TorrentBroadcast, but to fix the block manager so that:
- data stored in memory is not encrypted
- data written to disk is encrypted
This would simplify the code paths that use BlockManager / SerializerManager APIs (e.g. see
SPARK-19520), but requires some tricky changes inside the BlockManager to still be able to use file channels to avoid reading whole blocks back into memory so they can be decrypted.